Sage Journals: Discover world-class research

Abstract

Continuous-time state-space models (SSMs) are flexible tools for analysing irregularly sampled sequential observations that are driven by an underlying state process. Corresponding applications typically involve restrictive assumptions concerning linearity and Gaussianity to facilitate inference on the model parameters via the Kalman filter. In this contribution, we provide a general continuous-time SSM framework, allowing both the observation and the state process to be non-linear and non-Gaussian. Statistical inference is carried out by maximum approximate likelihood estimation, where multiple numerical integration within the likelihood evaluation is performed via a fine discretization of the state process. The corresponding reframing of the SSM as a continuous-time hidden Markov model, with structured state transitions, enables us to apply the associated efficient algorithms for parameter estimation and state decoding. We illustrate the modelling approach in a case study using data from a longitudinal study on delinquent behaviour of adolescents in Germany, revealing temporal persistence in the deviation of an individual's delinquency level from the population mean.

Keywords

hidden Markov model (HMM)Irregular time intervals non-Gaussian and non-linear processes Ornstein-Uhlenbeck process sequential data

1 Introduction

State-space models (SSMs) are flexible tools for analysing sequential observations that depend on underlying non-observable states, with interest and hence inference typically centred on the states. There are two main conceptual decisions to be made when tailoring an SSM to any given application, concerning (a) the nature of the state space and (b) whether the state process is defined as operating in discrete or continuous time. Regarding (a), the nature of the state space depends on the interpretation of the latent variable. The latter could relate either to discrete states, for example indicating an individual's health status (e.g., infected versus not infected; Conn and Cooch (2009) or an animal's behavioural modes (e.g., travelling, resting, and foraging; van Beest et al., (2019), or to continuous states, for example related to the nervousness of the financial market (e.g., within stochastic volatility models; Kim et al., (1998) or to an athlete's current form (e.g., in analyses of serial correlation in performance; Ötting et al., (2020). In some applications, the specification of the state space is obvious (e.g., in simple capture-recapture studies, with states corresponding to dead and alive; King and Langrock (2016), whereas in others it constitutes a modelling choice (e.g., in stochastic volatility modelling, where the market states are commonly considered to be continuous, but sometimes dichotomized to calm and nervous, respectively; Bulla and Bulla (2006). Regarding (b), that is, the decision whether the model is defined as operating in discrete or continuous time, the time formulation is usually determined by the sampling scheme of the data at hand. While discrete-time models are appropriate for time series with regular time intervals, continuous-time models are more suitable for irregularly spaced observations. However, as irregularly sampled data can often be augmented via imputation to give a regular series, or temporarily aggregated to yield regularly spaced observations, the choice of the time formulation is not necessarily trivial.

With these two dimensions along which a conceptual modelling decision needs to be made, we distinguish four possible formulations of SSMs as presented in Table 1, with either discrete or continuous states and operating in either discrete or continuous time. We refer to models with finite state space as hidden Markov models (HMMs), using the label SSM to refer to models with infinitely many and usually continuous-valued states (a distinction commonly made in the literature, though some authors refer to both model classes as HMMs; e.g., Cappé et al., (2005). In terms of statistical inference, discrete-time HMMs arguably constitute the simplest case from Table 1. In particular, for these models, there are recursive schemes, for example to evaluate the likelihood, that are applicable under various dependence structures and flexible distributional assumptions. The continuous-time formulation of HMMs is only slightly more involved, with the underlying state process then governed by a continuous-time (rather than a discrete-time) Markov chain. The main inferential tools available for discrete-time HMMs are applicable also in continuous time (Jackson et al., (2003), though some extensions, for example to accommodate time-varying covariates, are not straightforward anymore (e.g., Michelot and Blackwell (2019); Mews et al., (in press); Williams et al., (2020). When considering a model in discrete time but with a continuous state space, that is, an SSM, then inference is straightforward only in the linear and Gaussian case, for which the Kalman filter is applicable (e.g., McCrea et al., (2010); Durbin and Koopman (2012). In the more general case, inference is hindered by the fact that the likelihood contains multiple integrals, making direct evaluation difficult (e.g., Kitagawa (1987); Bartolucci and De Luca (2003); Langrock (2011).

Table 1

Possible formulations of state-space models

		latent variable (state)
		discrete	continuous
time	discrete	HMMs	SSMs
time	continuous	continuous-time HMMs	continuous-time SSMs

Despite these difficulties that arise when extending (discrete-time) HMMs either to have a continuous state space or to be formulated in continuous time, the corresponding extensions are nevertheless well covered in the existing literature and are fairly routinely applied. In this article, we focus on the fourth case from Table 1, that is, SSMs that are formulated in continuous time (and are not necessarily linear and Gaussian). Such models, which are not nearly as well documented in the literature as the other three classes from Table 1, are relevant in the context of irregularly sampled data in conjunction with an underlying continuous-valued state process. In particular, irregularly spaced observations are quite common in datasets on natural phenomena such as earthquakes (e.g., Beyreuther et al., (2008), in medical data (e.g., Amoros et al., (2019), or in survey data, which for example relate to psychological measurements (e.g., Oravecz et al., (2011). While continuous-time modelling can sometimes be avoided also in case of irregular sampling, for example using imputation methods as in (Kim and Stoffer (2008), continuous-time SSMs are more realistic and flexible than models that assume simplifications of either the time formulation or the nature of the latent variable. In some applications, continuous-time SSMs with a diffusion state process are considered (see, e.g., Niu et al., (2016); Lavielle (2018); Michelot et al., (2021), but except for (Albertsen et al., (2015), who use t-distributed measurement errors, both the state and the observation process are usually assumed to be linear and Gaussian to allow for the application of the Kalman filter (see, e.g., Johnson et al., 2008; Tandeo et al., 2011; Dennis and Ponciano, 2014; Koopman et al., 2018; Jonsen et al., 2020). In particular, a more general modelling framework for formulating and estimating SSMs in continuous time is still lacking. As most of the continuous-time SSMs mentioned above focus on a specific type of application, no off-the-shelf tools are readily available for general, that is, possibly non-linear and non-Gaussian modelling of irregularly spaced sequential data driven by a latent state process.

In our contribution, we present a flexible framework for continuous-time SSMs, allowing both the observation process as well as the state process to be non-linear and non-Gaussian. Our approach thus enables a variety of possible model specifications, requiring only that the transition density of the state process has an explicit analytical form (though even this can in fact be relaxed—see Section 2). The latter condition is satisfied by all linear processes, including the Ornstein-Uhlenbeck (OU) process, as well as the non-linear geometric Brownian motion and the Cox-Ingersoll-Ross process. As the model's likelihood involves intractable integration over all possible realizations of the continuous-valued state process at each observation time, we follow ideas from (Kitagawa (1987), (Bartolucci and De Luca (2003), and (Langrock (2011) and approximate the integral by finely discretizing the state space. This approximation can be regarded as a reframing of the model as a continuous-time HMM with a large but finite number of states, enabling us to apply the corresponding efficient algorithms. This state-space discretization trick to facilitate inference has in fact been used before to fit continuous-time SSMs, specifically for filtering fish movement tracks from noisy position measurements (Pedersen et al., (2008); Thygesen et al., (2009); Pedersen et al., (2011a). We apply effectively the same techniques, though presenting the model framework in general terms rather than focusing on any specific application. The key strength of the approach is its great flexibility to easily consider virtually any type of non-linear and non-Gaussian SSM, whereas most continuous-time SSMs in the literature are bound to specific applications.

In Section 2, we first discuss statistical inference for continuous-time SSMs based on approximating the likelihood via state discretization. Subsequently, in Section 3, we demonstrate the feasibility of our approach and investigate the estimation accuracy in simulation experiments. An illustrating case study on delinquent behaviour of adolescents is presented in Section 4.

2 Methodology

We consider a sequence of random variables, $Y_{t_{1}}, \dots, Y_{t_{T}}$ , observed at discrete but irregularly spaced time points $t_{0}, t_{1}, \dots, t_{T}$ , where $0 = t_{0} < t_{1} < \dots < t_{T}$ . This observation process is assumed to be driven by an underlying and non-observable continuous-valued state process, ${X_{t}}_{t \geq 0}$ , which operates in continuous time and is assumed to be Markovian. Conditional on $X_{t_{k}}$ , that is, the state at time $t_{k}$ , $Y_{t_{k}}$ is assumed to be independent of all other observations and states. The model is thus specified via the conditional distributions

Y_{t_{k}} | X_{t_{k}} and X_{t_{k}} | X_{t_{k - 1}} .

(2.1)

In the following, we assume that the transition density of the state process, that is, the probability density function of $X_{t}$ given $X_{s} = x_{s}$ , for $t > s$ , is available in closed form. In principle, if no explicit analytical expression is available for the transition density, then approximations can be used. For example, if the time evolution of the state process is given by the forward Kolmogorov equation (or Fokker-Plank equation), the transition density can be calculated using a partial differential equation solver (see, e.g., Thygesen et al., 2009; Pedersen et al., 2011b). Another possibility is to approximate the transition density using the Euler-Maruyama discretization scheme to obtain Gaussian increments over small time steps (see, e.g., Pedersen, 1995a, b; Albertsen, 2019; Michelot et al., 2019). In the following, the transition density is denoted by $p_{Δ} (x_{t} | x_{s})$ with $Δ = t - s$ . No restrictive assumptions are made for $Y_{t_{k}} | X_{t_{k}}$ , specifically allowing the conditional distribution of the observations to be either continuous or discrete (and even categorical).

One possible choice for the state process is the OU process, which is described by the stochastic differential equation (SDE)

d X_{t} = θ (μ - X_{t}) dt + σ d W_{t}, X_{0} = x_{0},

(2.2)

where

θ > 0

is the drift parameter indicating the strength of reversion to the long-term mean

μ \in

σ > 0

controls the strength of fluctuations, and

W_{t}

denotes the Brownian motion. Due to its mean-reverting property, the OU process is a natural candidate for applications in which the latent variable fluctuates around some equilibrium state.

For simplicity of notation, we let $τ = 0, 1, \dots, T$ denote the index of the observation in the time series, such that in the following we use the shorthand notation $Y_{τ}$ to indicate $Y_{t_{τ}}$ , and likewise $X_{τ}$ for $X_{t_{τ}}$ , whenever unambiguous. While $τ$ is an integer, $t_{τ}$ can be any non-negative number and represents the time at which the observation $τ$ was collected. Consequently, $Δ_{τ} = t_{τ} - t_{τ - 1}$ denotes the time difference between consecutive observations.

The likelihood of an SSM as in (2.1) can be calculated by integrating over all possible values of the state process potentially underlying each observation time, resulting in an expression involving $T + 1$ integrals. To evaluate this multiple integral and hence the likelihood, we finely discretize the continuous-valued state space, as first proposed by (Kitagawa (1987). Specifically, we define a range of possible values of the state process, $[b_{0}, b_{m}]$ , which we divide into $m$ intervals $B_{i} = (b_{i - 1}, b_{i})$ , $i = 1, \dots, m$ , of equal length $(b_{m} - b_{0}) / m$ , requiring both the range $[b_{0}, b_{m}]$ and $m$ to be sufficiently large. Making use of the model's dependence structure and applying numerical integration, the SSM likelihood can then be approximated in the following way:

\begin{matrix} L_{T} & = \int \dots \int p (y_{0}, \dots, y_{T}, x_{0}, \dots, x_{T}) {dx}_{T} \dots {dx}_{0} \\ = \int \dots \int p (x_{0}) p (y_{0} | x_{0}) \prod_{τ = 1}^{T} p_{Δ_{τ}} (x_{τ} | x_{τ - 1}) p (y_{τ} | x_{τ}) {dx}_{T} \dots {dx}_{0} \\ \approx \sum_{i_{0} = 1}^{m} \dots \sum_{i_{T} = 1}^{m} p (x_{0} \in B_{i_{0}}) p (y_{0} | x_{0} = b_{i_{0}}^{*}) \\ \times \prod_{τ = 1}^{T} p_{Δ_{τ}} (x_{τ} \in B_{i_{τ}} | x_{τ - 1} = b_{i_{τ - 1}}^{*}) p (y_{τ} | x_{τ} = b_{i_{τ}}^{*}), \end{matrix}

(2.3)

with

b_{i}^{*}

denoting the midpoint of the interval

B_{i}

and using

p

as a general symbol for either a density or a probability. There are alternative ways to approximate the multiple integral (see, e.g., Bartolucci and De Luca (2003); Zucchini et al., (2016), but which of these is used does not make a difference in practice, provided

m

is sufficiently large. If a small

m

is used, for example

m < 20

, then the likelihood approximation may be very inaccurate and results thus unreliable (see Section 3 for details).

The discretization of the state space into $m$ intervals effectively amounts to an approximation of the SSM by an $m$ -state HMM, which allows us to apply the entire HMM methodology to our model. In particular, we can use the HMM forward algorithm to more efficiently calculate the approximate likelihood in Equation (2.3), which as it stands has a computational cost of order $O ({Tm}^{T})$ . To recognize the approximation as an HMM, we specify the initial distribution $δ = (δ_{1}, \dots, δ_{m})$ with $δ_{i} = p (x_{0} \in B_{i})$ , and define the $i$ -th entry of an $m \times m$ diagonal matrix $P (y_{τ})$ as $p (y_{τ} | x_{τ} = b_{i}^{*})$ . Further, we define the $m \times m$ transition probability matrix $Γ_{Δ_{τ}} = (γ_{i, j}^{Δ_{τ}})$ by specifying $γ_{i, j}^{Δ_{τ}} = p_{Δ_{τ}} (x_{τ} \in B_{j} | x_{τ - 1} = b_{i}^{*})$ , conditioning on ${x_{τ - 1} = b_{i}^{*}}$ instead of ${x_{τ - 1} \in B_{i}}$ to avoid the integration over all possible values from $B_{i}$ . As indicated by the corresponding superscript, the state transition probabilities $γ_{i, j}^{Δ_{τ}}$ depend on the time difference $Δ_{τ}$ between consecutive observations. Using the HMM forward algorithm to calculate the approximate likelihood in Equation (2.3) reduces the computational cost to order $O ({Tm}^{2})$ and yields the following matrix product:

L_{T} \approx δ P (y_{0}) (\prod_{τ = 1}^{T} Γ_{Δ_{τ}} P (y_{τ})) 1,

(2.4)

where

1 \in^{m}

denotes a column vector of ones.

The entries in $P (y_{τ})$ are simply the conditional densities or probabilities as determined by the model assumed for the observation process, $Y_{τ} | X_{τ}$ (a concrete example will be given in Section 4). To illustrate how $Γ_{Δ_{τ}}$ is obtained, consider the example of the OU process in Equation (2.2). This process has a Gaussian transition density $p_{Δ} (x_{τ} | x_{τ - 1})$ , such that

X_{τ} | X_{τ - 1} = x \sim N (e^{- θ Δ_{τ}} x + μ (1 - e^{- θ Δ_{τ}}), \frac{σ^{2}}{2 θ} (1 - e^{- 2 θ Δ_{τ}})),

based on which the transition probabilities

γ_{i, j}^{Δ_{τ}}

can be calculated (see, e.g., Cerbone et al., (1981). For this state process and assuming stationarity, the initial state probabilities

δ_{i}

can be calculated based on the limiting distribution

X_{τ} \sim N (μ, \frac{σ^{2}}{2 θ})

of the OU process.

Irrespective of the specific assumptions made for $Y_{τ} | X_{τ}$ and $X_{τ} | X_{τ - 1}$ , the parameters of the SSM can be estimated by numerically maximizing the approximate likelihood in Equation (2.4), subject to standard technical issues as detailed for example in (Zucchini et al., (2016). As the maximum likelihood estimators of HMM parameters are asymptotically normal under standard regularity conditions (Cappé et al., (2005), the inverse of the observed Fisher information can be used to approximate the variance-covariance matrix of the estimators, and based on this standard errors and confidence intervals (CIs) can be calculated. In practice, the range of the state process $[b_{0}, b_{m}]$ as well as the number of intervals $m$ used for the likelihood approximation need to be specified. Regarding the choice of $[b_{0}, b_{m}]$ , it is important to cover the essential range of possible values of the state process, which can be examined by looking at the (estimated) stationary distribution of the state process (if available), and by monitoring the latter while fitting the model. A possible approach is to first estimate the model parameters using a rather large range for $[b_{0}, b_{m}]$ to obtain rough parameter estimates, then re-fitting the model using a more refined range of the state process. For example, for the OU process, a conservative choice would be $[- 3 σ / \sqrt{2 θ}, 3 σ / \sqrt{2 θ}]$ (corresponding to three times the standard deviation in either direction as suggested by Fridman and Harris (1998). Regarding the choice of $m$ , the idea is to use an $m$ large enough such that the approximation of the SSM by the $m$ -state HMM is virtually exact (recall that the latter is not of interest in itself, but only serves as a tool to fit the continuous-state model). It is intuitively clear that the more intervals are used, the closer the likelihood can be approximated, but the longer the computation time—a classical trade-off situation. Therefore, to gain some understanding of how many intervals are sufficient for the likelihood approximation, the next section will investigate the effect of $m$ on the estimation accuracy.

3 Simulation experiments

Simulations were conducted to explore the effect of approximating the likelihood by discretizing the continuous-valued state process, in particular with regard to the estimation accuracy. While the likelihood approximation can be rendered arbitrarily accurate by using increasingly many intervals in the discretization, it is not clear at which number of intervals $m$ the parameter estimation stabilizes such that increasing $m$ does not (substantially) change the estimation results anymore. We further investigate if the appropriate number of intervals $m$ needed for the approximation depends on the variability of the underlying state process.

We consider three simulation settings, in which the state process is modelled using the OU process (cf. Equation (2.2)) with long-term mean $μ = 0$ . For the drift parameter $θ$ and the diffusion parameter $σ$ , we choose three different parameter combinations, $(θ, σ) \in {(0.02, 0.1), (0.5, 0.5), (2, 1)}$ , which all share the same limiting distribution, namely $X_{τ} \sim N (0, 0 . 5^{2})$ . The variability, as governed by the diffusion parameter $σ$ , increases from Setting 1 to Setting 3. Example path realizations of the three state processes considered are shown in Figure 1. The observation process is assumed to be a Poisson-distributed, irregularly spaced sequence of counts with

Y_{τ} \sim Poisson (λ_{τ}), λ_{τ} = \exp (X_{τ}) α,

such that the mean of the observation process fluctuates (asymmetrically) around

α > 0

. We set

α = 200

and generate one sequence of

T = 2000

observations for each setting. The time intervals between consecutive observations are measured in days and were drawn from a Poisson distribution with a mean of 30 hours (the time scales are arbitrary here and are stated merely to aid interpretation). The simulated count data for each setting are shown in Figure A1 in the Online Supplementary Material.

Figure 1

Example path realizations of the OU processes considered. The graphs were obtained by application of the Euler-Maruyama scheme with initial value 0 and step length 0.01

For parameter estimation, we approximate the likelihood by discretizing the state space as described in Section 2, and vary the number of intervals

m

used in the approximation. In each setting, we thus repeatedly estimate the model parameters for a single sequence of counts by numerically maximizing the likelihood given in Equation (2.4), considering

m = 20, 30, 50, 100, 150

intervals and choosing a range of

[b_{0} = - 2.5, b_{m} = 2.5]

for the state process. While we present the estimation results obtained for only a single sample per setting—simply as we aim to illustrate here the effect of the likelihood approximation using different interval numbers

m

, rather than sampling properties of the estimators—we provide the R code for the simulation experiments in the Online Supplementary Material such that similar results can easily be replicated for other realizations of the process.

Table 2

Estimated parameters, computation times and maximum log-likelihood (llk) values in the simulation experiments considering different numbers of intervals $m$ used in the likelihood approximation

Setting 1
	$θ$	$σ$	$α$	comp. time (sec)	–llk
$m = 20$	0.0167	0.106	285.3	6.1	9874.21
$m = 30$	0.0186	0.096	177.9	8.9	9577.25
$m = 50$	0.0164	0.098	167.5	18.1	9542.46
$m = 100$	0.0174	0.101	191.4	37.1	9544.06
$m = 150$	0.0174	0.101	191.5	65.7	9544.15
true values	0.02	0.1	200
Setting 2
	$θ$	$σ$	$α$	comp. time (sec)	–llk
$m = 20$	0.190	0.453	339.6	4.8	12009.62
$m = 30$	0.484	0.495	211.5	7.4	11722.41
$m = 50$	0.494	0.499	193.7	17.2	11715.25
$m = 100$	0.495	0.500	197.7	26.3	11715.03
$m = 150$	0.495	0.500	197.7	41.5	11715.04
true values	0.5	0.5	200
Setting 3
	$θ$	$σ$	$α$	comp. time (sec)	–llk
$m = 20$	2.345	1.080	195.5	8.4	12275.82
$m = 30$	2.343	1.086	194.0	7.9	12083.40
$m = 50$	2.361	1.090	198.4	13.8	12065.85
$m = 100$	2.358	1.091	199.1	35.2	12066.90
$m = 150$	2.358	1.091	199.1	64.6	12066.90
true values	2	1	200

For each simulation setting and the different numbers of intervals $m$ considered, the maximum log-likelihood values, the relative biases of the estimated parameters, and the computation times are shown in Table 2. The computation time increases with increasing interval numbers, whereas the maximum likelihood values as well as the estimated parameters stabilize with increasing $m$ . This is to be expected: given a sufficiently fine discretization, a further increase in the interval numbers does not yield a relevant difference in the likelihood value. There are, however, some differences between the three settings considered: while in Setting 1, the estimation results stabilize not until $m \geq 100$ , the likelihood values and the estimated parameters do not change much in the other settings when increasing the number of intervals to $m > 50$ . The more variable the underlying OU process (i.e., the larger the diffusion $σ$ ), the less intervals $m$ are thus needed in the approximation. In other words, when the observations fluctuate considerably, then the discretization of the state process does not need to be as fine as when the process has a higher persistence, in which case a finer discretization is required to detect the associated more gradual changes. A conservative choice of $m \geq 100$ is however advisable, provided that the resulting computational cost is acceptable. In our simulations, fitting the models with $m \geq 100$ took about one minute on a 1.6 GHz Intel^® Core^TM i5 CPU.

In a second simulation experiment, we ran an empirical check of the estimators’ consistency. Specifically, focusing on Setting 2 above, we simulated 200 datasets of $T = 2000, 5000, 10000$ observations each and then estimated the model parameters with fixed $m = 100$ . The results indicate that in this particular setting, the parameter estimators are approximately unbiased already for $T = 2000$ , with the precision increasing with increased sample size (see Figure 2).

Figure 2

Boxplots of relative bias of the estimated model parameters from 200 simulation runs for $T = 2 000, 5 000, 10 000$ observations. True parameter values are $β = 0.5$ , $σ = 0.5$ , and $α = 200$

4 Case study on delinquent behaviour in adolescence and young adulthood

4.1 Model formulation

We analyse data from the longitudinal research project Crime in the Modern City on deviant and delinquent behaviour of adolescents and young adults in Western Germany (for more details see Boers et al., (2010); Seddig and Reinecke (2017). The survey was first conducted in the year 2002 and comprised students in the 7th grade at public schools, who were mostly 12 to 13 years old. This cohort was repeatedly interviewed by means of self-administered questionnaires over a study period of 16 years. In each survey, the participants were asked about various offences like graffiti spraying, shop-lifting, drug abuse, or assault with and without a weapon, and indicated how often they had committed each offence in the twelve months prior to the survey. The data collection, however, did not follow a regular sampling scheme as the first eight waves of the panel study were administered annually, while the last four waves were conducted biannually. Further, due to wave nonresponse, meaning that some participants would not respond in one or more panel waves, the dataset contains missing values, which is quite common in longitudinal studies. As a consequence, the length of time intervals between consecutive observations is irregular and ranges from one to four years.

In this case study, we consider the total number of offences indicated in each survey, from which individual trajectories of delinquent behaviour can be constructed. We included all participants who committed at least one offence within the study period, resulting in 12 327 observations from 1 093 adolescents (467 male and 626 female). The distribution of the number of offences for different age classes and both gender is shown in Figure 3. No delinquent behaviour was most often reported (72.6% of observations), while overall the median number of offences, given that any were committed within the previous twelve months, is 3 (min: 1; max: 160).

Figure 3

Boxplots of the number of offences committed in the twelve months prior to the survey for different age classes and both gender. Outliers have been removed from the plot for clarity. The same figure including outliers is provided in Figure A2 in the Online Supplementary Material

The main aim is to investigate the persistence of the delinquency level, which is assumed to be a latent trait underlying the observed trajectories of adolescents’ and young adults’ delinquent behaviour. Therefore, we model the number of offences using an SSM, which we formulate in continuous time to address the irregular spacing of the observations as caused by the study design and the missing data. Arguably, the data could also be regarded as a yearly time series with missing data and hence modelled using a discrete-time process—however, a continuous-time process constitutes a convenient alternative, which directly accommodates the time gaps. To allow for possible overdispersion, we assume the number of offences to follow a negative binomial distribution (conditional on the states). As the study participants’ age and gender are known to affect their delinquent behaviour (e.g., Reinecke and Weins (2013), we additionally include these covariates in the observation process. The observation process of the SSM is then specified as

\begin{matrix} Y_{τ} \sim & NegBinom (ν_{τ}, ϕ), \\ ν_{τ} = \exp (X_{τ} + f_{1} ({age}_{τ}) + f_{2} ({age}_{τ}) \cdot {gender}_{τ}), \end{matrix}

(4.1)

where

ν_{τ}

is the mean of observation number

τ

(at time

t_{τ}

), and

ϕ

is the dispersion parameter of the negative binomial distribution. The mean is modelled as a function of the current state

X_{τ}

(i.e., the current delinquency level relative to the population mean for the relevant age group) as well as the covariate age and its interaction with gender. To allow for a non-linear relationship, as indicated by Figure 3, the effect of age on the number of offences committed is modelled nonparametrically. Specifically, we set

f_{i} ({age}_{τ}) = \sum_{l = 1}^{8} ω_{i, l} C_{l} ({age}_{τ})

, for

i = 1, 2

, using cubic B-spline basis functions

C_{l}

and 12 equally spaced knots ranging from 7 to 35 (De Boor (1978); Eilers and Marx (1996). Note that as the observations correspond to the number of offences committed in the twelve months prior to the survey, the state process should be interpreted as the current last-12-month average delinquency level.

We further specify the state process to be an OU process with $μ = 0$ (cf. Equation (2.2)) such that an individual's delinquency level—or, more precisely, the deviation of the individual's delinquency level from the population mean—is persistent over time and changes gradually. Negative values of the state process then indicate that the individual is less inclined to delinquent activities, given its gender and age, whereas positive values indicate a higher inclination than would be expected based on gender and age. The parameters of interest, that is, the drift parameter and diffusion coefficient of the OU process for the state process, as well as the regression coefficients of the covariate effects and the dispersion parameter of the negative binomial distribution for the observation process, are estimated using maximum (approximate) likelihood as described in Section 2. For the state discretization, we set $m = 100$ and choose $[b_{0} = - 9, b_{m} = 9]$ as a possible range for the state process.

To assess whether the SSM formulation is actually needed to describe the structure in the data, we additionally fit a simple generalized additive model (GAM) without an underlying state process. This benchmark model is formulated analogously as stated in Equation (4.1), omitting $X_{τ}$ , and corresponds to the assumption that an individual's delinquency level is not persistent over time. We further tried a generalized additive mixed model (GAMM) including a random intercept term per individual, which can be seen as equivalent to the model from (4.1) but with constant $X_{τ}$ for each individual. Both the GAM and the GAMM were fitted in R using the gamlss package (Rigby and Stasinopoulos (2005). To investigate how the SSM, within which a continuous state space is considered, performs compared to models which assume only a finite number of states, we further fit continuous-time HMMs with two to five (discrete) states. Finally, we simulate observations based on the fitted SSM to check if our model is able to reproduce the patterns found in the real data.

4.2 Results

According to the AIC, the continuous-time SSM is clearly favoured over the benchmark GAM (ΔAIC: 1375) as well as over the GAMM including a random intercept term (ΔAIC: 562). The parameter estimates associated with the OU process and the dispersion parameter of the negative binomial distribution are shown in Table 3. Regarding the dispersion parameter, its small value reflects the large variation in the number of offences. For the state process, the small value of the estimated drift parameter indicates fairly strong serial dependence, while the estimated diffusion coefficient shows that the deviations from zero can be large. In particular, the limiting distribution of the OU process is estimated as

X_{τ} \sim N (0, 2 . 23^{2})

, indicating that considerable differences in the delinquency levels of adolescents can be observed over time. This difference in and temporal persistence of latent delinquency levels can also be illustrated using simulated state trajectories based on the estimated parameters of the OU process (cf. Figure A3 in the Online Supplementary Material).

Table 3

Parameter estimates with 95% CIs for the drift parameter $θ$ and the diffusion coefficient $σ$ of the OU process as well as the dispersion parameter $ϕ$ of the negative binomial distribution. The CIs were calculated based on the observed Fisher information

parameter	estimate	95% CI
$θ$	0.222	[0.194; 0.255]
$σ$	1.489	[1.346; 1.647]
$ϕ$	0.570	[0.483; 0.674]

The estimated effects of age and gender on the mean parameter of the negative binomial distribution are visualised in Figure 4. While the effect of age on the expected number of offences is quite similar for both gender, female adolescents generally display a lower level of delinquent behaviour than males, which corresponds to the current state of research (e.g., Reinecke and Weins (2013). Overall, the effect of age is highly non-linear. Until the age of 14 to 15, there is an increase in delinquent behaviour, followed by a steady decline in the expected number of offences, which reflects the typical age-crime curve (e.g., Moffitt (1993). During the twenties, the expected number of offences increases again, which might here mainly be caused by data collection issues as young adults can commit additional offences that are not considered for adolescents.

Figure 4

Estimated effect of age on the expected number of offences for male (blue) and female (red) adolescents, respectively, given that the state equals 0

Due to transferring the SSM to an HMM framework (cf. Section 2), we can gain additional insight into the delinquency levels of individuals by using the Viterbi algorithm to infer the most probable sequence of underlying states. Based on these decoded delinquency levels as well as the individuals’ gender and age, the expected number of offences can be calculated at each observation time. Such decoded trajectories are shown for eight male adolescents in Figure 5. As a result of the underlying delinquency levels, individuals’ trajectories of the expected number of offences deviate from the overall age trend and fluctuate around the latter. Moreover, different trajectories are visible: while some adolescents have a permanently increased or reduced level of delinquency, others show early or late periods of increased delinquency levels.

Figure 5

Example trajectories of the logarithm of the expected number of offences for eight male individuals based on their decoded delinquency level at each observation time. The thicker, black line represents the expected trajectory for male adolescents, when their delinquency level is in equilibrium

When simulating observations based on the fitted SSM, the resulting synthetic data quite well reflect the general distribution of the number of offences observed for different age classes and both gender (cf. Figure A4 in the Online Supplementary Material). However, the simulations indicate some lack of fit with regard to the distribution of the observation process as well as to the smoothness of the underlying state process. In particular, the model can generate observations which are (much) larger than the maximum value observed in the real data due to the heavy right tail of the fitted negative binomial distribution (though this applies to only 0.23% of the simulated data). Regarding the underlying process, the simulated state trajectories tend to show stronger fluctuations in the delinquency levels than we observed in the decoded state sequences of the case study. The model fit of the continuous-time SSM might thus be improved by using a smoother process than the OU process to model the evolution of the underlying state process.

Furthermore, there are other ways in which the model formulation used in the case study could be improved and extended for a more comprehensive analysis of adolescents’ delinquent behaviour. In particular, additional variables possibly affecting the delinquency level could be considered in the model, and these could be individual-specific, time-varying, or both. Incorporating time-varying covariates into the continuous-time state process is rather challenging as it renders the likelihood calculation analytically intractable. An important exception is the case where the covariate of interest is piecewise constant over time (see, e.g., Faddy, 1976; Kay, 1986). Moreover, the current model formulation does not account for possible heterogeneity across survey participants. As indicated in Figure 5, some adolescents’ expected delinquency is consistently higher or lower than the population mean, which could be addressed for example by modelling the long-term mean $μ$ of the OU process as a random effect. Modelling individual variability via random effects would however further increase the computational cost associated with maximizing the likelihood, which is already fairly high.

Finally, it is not a priori clear if the data are more adequately modelled using a continuous- or discrete-valued underlying state process. Comparing our SSM to 2- to 5-state HMMs reveals that based on the AIC, continuous-time HMMs with more than two states are favoured over the SSM assuming a continuous state space (cf. Table A1 in the Online Supplementary Material). However, choosing an adequate number of latent delinquency levels comes with its own challenges. In particular, for HMM-like models, the AIC generally tends to select models with a larger than plausible number of states—in our case the 5-state model was chosen, but without us testing higher-order models—hence impeding a meaningful interpretation of the states (see, e.g., Pohle et al., (2017). Overall, the case study illustrates the versatility of our modelling and the associated inferential framework, for example regarding the specification of non-Gaussian distributions in the observation process as well as nonparametric covariate effects.

5 Discussion

In this contribution, we developed a flexible framework for formulating and estimating general continuous-time SSMs. These are latent-state models suited to sequential observations that are irregularly spaced in time, that is, data to which discrete-time models are not (directly) applicable. In some applications, for example in biology (e.g., Runde et al., (2020), psychology (e.g., de Haan-Rietdijk et al., (2017), or finance (e.g., Kim and Stoffer (2008), irregularly spaced observations are simply treated as if they do follow a regular sampling scheme, or are forced into a sequence with regular (i.e., equidistant) time intervals based on data aggregation or imputation. These aggregated or imputed data are then analysed using discrete-time models, which are less technically challenging than their continuous-time counterparts. However, temporal aggregation of continuous-time processes discards information on the exact observation times and introduces subjectivity concerning the choice of the discrete-time modelling resolution, while imputation methods for generating regular time intervals introduce additional uncertainty, which is why both approaches possibly produce biased estimates (see, e.g., Yip and Wang (2002); Delsing et al., (2005); Barbour et al., (2013); Kleinke et al., (2021). Therefore, continuous-time models are generally preferable when data are collected at irregular points in time. These models are not only conceptually appealing as their interpretation does not depend on the time resolution of the data at hand, but also avoid the pitfalls mentioned above. These benefits come at the cost of increased mathematical and computational complexity, especially for the case of SSMs with non-linear and non-Gaussian processes.

While we are not the first to consider continuous-time SSMs, existing models often focus on a particular data application and hence are very case-specific (e.g., Dennis and Ponciano (2014); Albertsen et al., (2015); Niu et al., (2016). In particular, existing approaches usually make restrictive model assumptions to simplify parameter estimation, for example requiring the SSM to be linear and Gaussian to enable the application of the Kalman filter (e.g., Johnson et al., (2008); Tandeo et al., (2011); Koopman et al., (2018); Lavielle (2018); Jonsen et al., (2020). In contrast, the maximum (approximate) likelihood approach we propose here is not tied to specific distributional or linearity assumptions, thus allowing for both non-linear and non-Gaussian specifications of the state and observation process. Our method, however, is by no means the only method to fit continuous-time SSMs: Apart from the Kalman filter, which can be used for linear and Gaussian SSMs, MCMC methods (Niu et al., (2016) and Laplace approximation techniques (Albertsen et al., (2015); Michelot et al., (2021) as implemented in the R-package Template Model Builder (Kristensen et al., (2016) have been developed for statistical inference in continuous-time SSMs. While the modelling approach presented here is not assumed to be superior to such alternative estimation techniques, it offers the convenience of the continuous-time HMM framework and its corresponding efficient algorithms. The latter proves beneficial not only with respect to model fitting but also for decoding the most probable underlying state trajectories. Moreover, only minor changes in the corresponding code for the likelihood calculation are required to consider different distributions or non-linear relationships in either the observation or state process, provided that the transition density is known in explicit form. A major caveat of the approach, however, is that it suffers from a curse of dimensionality when considering multivariate state processes (e.g., Langrock (2011). In conclusion, our approach constitutes an accessible and very flexible framework for modelling irregularly spaced sequential data driven by a one-dimensional underlying state process.

Supplementary material

In the Online Supplementary Material, we provide the R code for the simulation experiments conducted in Section 3 and the R code used for the case study. The data for the case study cannot be shared due to privacy, but is available on request from the authors. Therefore, artificially simulated data based on the case study results and structured exactly as the real data is available for illustration. The supplementary material can be found at: http://www.statmod.org/smij/archive.html.

Footnotes

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors disclosed the receipt of the following financial support for the research, authorship and/or publication of this article: This research was funded by the German Research Foundation (DFG) as part of the SFB TRR 212 (NC3)-Projektnummer 316099922.

Acknowledgments

We would like to thank Christiane Fuchs for her valuable input on SDEs. We are also grateful to two anonymous reviewers for their insightful and very useful feedback that helped us to improve this article.

References

Albertsen

(2019) Generalizing the first-difference correlated random walk for marine animal movement data. Scientific Reports , 9, 4017.

Albertsen

, Whoriskey

, Yurkowski

, Nielsen

and Flemming

(2015) Fast fitting of non-Gaussian state-space models to animal movement data via Template Model Builder. Ecology , 96, 2598–2604.

Amoros

, King

, Toyoda

, Kumada

, Johnson

and Bird

(2019) A continuous-time hidden Markov model for cancer surveillance using serum biomarkers with application to hepato- cellular carcinoma. Metron , 77, 67–86.

Barbour

, Ponciano

and Lorenzen

(2013) Apparent survival estimation from continuous mark-recapture/resighting data. Methods in Ecology and Evolution , 4, 846–53.

Bartolucci

and De Luca

(2003) Likelihood-based inference for asymmetric stochastic volatility models. Computational Statistics & Data Analysis , 42, 445–49.

Beyreuther

, Carniel

and Wassermann

(2008) Continuous hidden Markov models: application to automatic earthquake detection and classification at Las Cann˜ adas caldera, Tenerife. Journal of Volcanology and Geothermal Research , 176, 513–18.

Boers

, Reinecke

, Seddig

and Mariotti

(2010) Explaining the development of adolescent violent delinquency. European Journal of Criminology , 7, 499–520.

Bulla

and Bulla

(2006) Stylized facts of financial time series and hidden Semi–Markov models. Computational Statistics & Data Analysis , 51, 2192–2209.

Cappe´

, Moulines

and Ryde´n

(2005) Inference in Hidden Markov Models . New York, NY: Springer.

10.

Cerbone

, Ricciardi

and Sacerdote

(1981) Mean variance and skewness of the first passage time for the Ornstein-Uhlenbeck process. Cybernetics and System , 12, 395–429.

11.

Conn

and Cooch

(2009) Multistate capture-recapture analysis under imperfect state observation: an application to disease models. Journal of Applied Ecology , 46, 486–92.

12.

De Boor

(1978) A Practical Guide to Splines . New York, NY: Springer.

13.

de Haan-Rietdijk

, Voelkle

, Keijsers

and Hamaker

(2017) Discrete-vs. continuous-time modeling of unequally spaced experience sampling method data. Frontiers in Psychology , 8, 1849.

14.

Delsing

MJMH

, Oud

JHL

and De Bruyn

EEJ

(2005) Assessment of bidirectional influences between family relationships and adolescent problem behavior. European Journal of Psychological Assessment , 21, 226–31.

15.

Dennis

and Ponciano

(2014) Density- dependent state-space model for population-abundance data with unequal time intervals. Ecology , 95, 2069–76.

16.

Durbin

and Koopman

(2012) Time Series Analysis by State Space Methods . Oxford: Oxford University Press.

17.

Eilers

PHC

and Marx

(1996) Flexible smoothing with B-splines and penalties. Statistical Science , 11, 89–102.

18.

Faddy

(1976) A note on the general time-dependent stochastic compartmental model. Biometrics , 32, 443–48.

19.

Fridman

and Harris

(1998) A maximum likelihood approach for non-Gaussian stochastic volatility models. Journal of Business & Economic Statistics , 16, 284–91.

20.

Jackson

, Sharples

, Thompson

, Duffy

and Couto

(2003) Multistate Markov models for disease progression with classification error. The Statistician , 52, 193–209.

21.

Johnson

, London

, Lea

M.-A

and Durban

(2008) Continuous-time correlated random walk model for animal telemetry data. Ecology , 89, 1208–15.

22.

Jonsen

, Patterson

, Costa

, Doherty

, Godley

, Grecian

, Guinet

, Hoenner

, Kienle

, Robinson

, Votier

, Whiting

, Witt

, Hindell

, Harcourt

and McMahon

(2020) A continuous-time state-space model for rapid quality control of Argos locations from animal-borne tags. Movement Ecology , 8, 1–13.

23.

Kay

(1986) A Markov model for analysing cancer markers and disease states in survival studies. Biometrics , 42, 855–65.

24.

Kim

and Stoffer

(2008) Fitting stochastic volatility models in the presence of irregular sampling via particle methods and the EM algorithm. Journal of Time Series Analysis , 29, 811–33.

25.

Kim

, Shephard

and Chib

(1998) Stochastic volatility: likelihood inference and comparison with arch models. The Review of Economic Studies , 65, 361–93.

26.

King

and Langrock

(2016) Semi-Markov Arnason–Schwarz models. Biometrics , 72, 619–28.

27.

Kitagawa

(1987) Non-Gaussian state-space modeling of nonstationary time series. Journal of the American Statistical Association , 82, 1032–41.

28.

Kleinke

, Reinecke

and Weins

(2021) The development of delinquency during adolescence: a comparison of missing data techniques revisited. Quality & Quantity , 55, 877–95.

29.

Koopman

, Commandeur

JJF

, Bijleveld

and Vujic´

(2018) Continuous time state space modelling with an application to high-frequency road traffic data. In Continuous Time Modeling in the Behavioral and Related Sciences , edited by Van Montfort

, Oud

JHL

and Voelkle

, pages 305–15. Cham: Springer.

30.

Kristensen

, Nielsen

, Berg

, Skaug

and Bell

(2016) TMB: Automatic differentiation and Laplace approximation. Journal of Statistical Software , 70, 1–21.

31.

Langrock

(2011) Some applications of nonlinear and non-Gaussian state-space modelling by means of hidden Markov models. Journal of Applied Statistics , 38, 2955–70.

32.

Lavielle

(2018) Pharmacometrics models with hidden Markovian dynamics. Journal of Pharmacokinetics and Pharmacodynamics , 45, 91–105.

33.

McCrea

, Morgan

BJT

, Gimenez

, Besbeas

, Lebreton

J.-D

and Bregnballe

(2010) Multi-site integrated population modelling. Journal of Agricultural, Biological and Environmental Statistics , 15, 539–61.

34.

Mews

, Langrock

, King

and Quick

(in press) Multi-state capture-recapture models for irregularly sampled data. Annals of Applied Statistics .

35.

Michelot

and Blackwell

(2019) State-switching continuous-time correlated random walks. Methods in Ecology and Evolution , 10, 637–49.

36.

Michelot

, Glennie

, Harris

and Thomas

(2021) Varying-coefficient stochastic differential equations with applications in ecology. Journal of Agricultural, Biological and Environmental Statistics , 26, 446–63.

37.

Michelot

, Gloaguen

, Blackwell

and E´tienne

M.-P

(2019) The Langevin diffusion as a continuous-time model of animal movement and habitat selection. Methods in Ecology and Evolution , 10, 1894–1907.

38.

Moffitt

(1993) Adolescence-limited and life-course-persistent antisocial behavior: a developmental taxonomy. Psychological Review , 100, 674–701.

39.

Niu

, Blackwell

and Skarin

(2016) Modeling interdependent animal movement in continuous time. Biometrics , 72, 315–24.

40.

Ötting

, Langrock

, Deutscher

and Leos-Barajas

(2020) The hot hand in professional darts. Journal of the Royal Statistical Society: Series A (Statistics in Society) , 183, 565–80.

41.

Oravecz

, Tuerlinckx

and Vandekerckhove

(2011) A hierarchical latent stochastic differential equation model for affective dynamics. Psychological Methods , 16, 468–90.

42.

Pedersen

(1995a) A new approach to maximum likelihood estimation for stochastic differential equations based on discrete observations. Scandinavian Journal of Statistics , 22, 55–71.

43.

Pedersen

(1995b) Consistency and asymptotic normality of an approximate maximum likelihood estimator for discretely observed diffusion processes. Bernoulli , 1, 257–79.

44.

Pedersen

, Patterson

, Thygesen

and Madsen

(2011a) Estimating animal behavior and residency from movement data. Oikos , 120, 1281–90.

45.

Pedersen

, Righton

, Thygesen

, Andersen

and Madsen

(2008) Geolocation of North Sea cod (Gadus morhua) using hidden Markov models and behavioural switching. Canadian Journal of Fisheries and Aquatic Sciences , 65, 2367–77.

46.

Pedersen

, Thygesen

and Madsen

(2011b) Nonlinear tracking in a diffusion process with a Bayesian filter and the finite element method. Computational Statistics & Data Analysis , 55, 280–90.

47.

Pohle

, Langrock

, Van Beest

and Schmidt

(2017) Selecting the number of states in hidden Markov models: pragmatic solutions illustrated using animal movement. Journal of Agricultural, Biological and Environmental Statistics , 22, 270–93.

48.

Reinecke

and Weins

(2013) The development of delinquency during adolescence: a comparison of missing data techniques. Quality & Quantity , 47, 3319–34.

49.

Rigby

and Stasinopoulos

(2005) Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society: Series C (Applied Statistics) , 54, 507–54.

50.

Runde

, Michelot

, Bacheler

, Shertzer

and Buckel

(2020) Assigning fates in telemetry studies using hidden Markov models: an application to deepwater groupers released with descender devices. North American Journal of Fisheries Management , 40, 1417–34.

51.

Seddig

and Reinecke

(2017) Exploration and explanation of adolescent self-reported delinquency trajectories in the Crimoc study. In The Routledge International Handbook of Life-Course Criminology , edited by Blokland

and Van der Geest

, pages 159–178. New York, NY: Routledge.

52.

Tandeo

, Ailliot

and Autret

(2011) Linear Gaussian state-space model with irregular sampling: application to sea surface temperature. Stochastic Environmental Research and Risk Assessment , 25, 793–804.

53.

Thygesen

, Pedersen

and Madsen

(2009) Geolocating fish using hidden Markov models and data storage tags. In Tagging and Tracking of Marine Animals with Electronic Devices , edited by Nielsen

, Arrizabalaga

, Fragoso

, Hobday

, Lutcavage

and Sibert

, pages 277–293. New York, NY: Springer.

54.

Van Beest

, Mews

, Elkenkamp

, Schuhmann

, Tsolak

, Wobbe

, Bartolino

, Bastardie

, Dietz

, von Dorrien

, Galatius

, Karlsson

, McConnell

, Nabe-Nielsen

, Olsen

, Teilmann

and Langrock

(2019) Classifying grey seal behaviour in relation to environmental variability and commercial fishing activity: A multivariate hidden Markov model. Scientific Reports , 9, 5642.

55.

Williams

, Storlie

, Therneau

, Jack

Jr and Hannig

(2020) A Bayesian approach to multistate hidden Markov models: application to dementia progression. Journal of the American Statistical Association , 115, 16–31.

56.

Yip

PSF

and Wang

(2002) A unified parametric regression model for recapture studies with random removals in continuous time. Biometrics , 58, 192–99.

57.

Zucchini

, MacDonald

and Langrock

(2016) Hidden Markov Models for Time Series: An Introduction Using R. Boca Raton, FL: Chapman & Hall/CRC.

Maximum approximate likelihood estimation of general continuous-time state-space models

Abstract

Keywords

1 Introduction

Table 1

Possible formulations of state-space models

Figure 1

Example path realizations of the OU processes considered. The graphs were obtained by application of the Euler-Maruyama scheme with initial value 0 and step length 0.01

Estimated parameters, computation times and maximum log-likelihood (llk) values in the simulation experiments considering different numbers of intervals m used in the likelihood approximation

Boxplots of relative bias of the estimated model parameters from 200 simulation runs for T = 2 000 , 5 000 , 10 000 observations. True parameter values are β = 0.5 , σ = 0.5 , and α = 200

4.1 Model formulation

Figure 3

Boxplots of the number of offences committed in the twelve months prior to the survey for different age classes and both gender. Outliers have been removed from the plot for clarity. The same figure including outliers is provided in Figure A2 in the Online Supplementary Material

Table 3

Parameter estimates with 95% CIs for the drift parameter θ and the diffusion coefficient σ of the OU process as well as the dispersion parameter ϕ of the negative binomial distribution. The CIs were calculated based on the observed Fisher information

Estimated effect of age on the expected number of offences for male (blue) and female (red) adolescents, respectively, given that the state equals 0

Example trajectories of the logarithm of the expected number of offences for eight male individuals based on their decoded delinquency level at each observation time. The thicker, black line represents the expected trajectory for male adolescents, when their delinquency level is in equilibrium

Supplementary material

Footnotes

Declaration of conflicting interests

Funding

Acknowledgments

References

Estimated parameters, computation times and maximum log-likelihood (llk) values in the simulation experiments considering different numbers of intervals $m$ used in the likelihood approximation

Boxplots of relative bias of the estimated model parameters from 200 simulation runs for $T = 2 000, 5 000, 10 000$ observations. True parameter values are $β = 0.5$ , $σ = 0.5$ , and $α = 200$

Parameter estimates with 95% CIs for the drift parameter $θ$ and the diffusion coefficient $σ$ of the OU process as well as the dispersion parameter $ϕ$ of the negative binomial distribution. The CIs were calculated based on the observed Fisher information