Abstract
External events are commonly known as interventions that often affect times series of counts. This research introduces a class of transfer function models that include four different types of interventions on integer-valued time series: abrupt start and abrupt decay (additive outlier), abrupt start and gradual decay (transient shift), abrupt start and permanent effect (level shift) and gradual start and permanent effect. We propose integer-valued transfer function models incorporating a generalized Poisson, log-linear generalized Poisson or negative binomial to estimate and detect these four types of interventions in a time series of counts. Utilizing Bayesian methods, which are adaptive Markov chain Monte Carlo (MCMC) algorithms to obtain the estimation, we further employ deviance information criterion (DIC), posterior odd ratios and mean squared standardized residual for model comparisons. As an illustration, this study evaluates the effectiveness of our methods through a simulation study and application to crime data in Albury City, New South Wales (NSW) Australia. Simulation results show that the MCMC procedure is reasonably effective. The empirical outcome also reveals that the proposed models are able to successfully detect the locations and type of interventions.
Keywords
Introduction
One of the major challenges in time series modelling is the presence of outliers as the series is often affected by external events commonly known as interventions. These various external events can be major corporate, political or economic policy initiatives or changes, technological advancements, work stoppages, sales promotions, advertising and so forth. These interventions influence the time series data either on a single or a few datapoints, or they impact the whole process from some specific time
Modelling time series of counts is an important task in research today and has garnered significant attention for decades with applications in various scientific areas, such as health science, economics, finance, environment, criminology, epidemiology, etc. (McKenzie (1985)) and (Al-Osh and Alzaid (1987)) introduce the first-order non-negative integer-valued autoregressive (INAR) model that is based on a binomial thinning operator. Most count data models make use of the Poisson distribution, but it is unable to describe over-dispersion. (Ferland et al. (2006)) propose the integer-valued generalized autoregressive conditional heteroscedastic (GARCH) model to handle the over-dispersion problem, and subsequently (Fokianos and Tjøstheim (2011)) consider a log-linear Poisson model for time series of counts. Our study therefore introduces a class of transfer function models for time series of counts that covers interventions in integer-valued GARCH processes as special cases.
(Fokianos and Fried (2010)) and (Fokianos and Fried (2012)) present three types of intervention in their proposed models, which are additive outliers, transient shift and level shift for Poisson and log linear Poisson integer-valued GARCH processes. (Liboschik et al. (2014)) also look into these three types of intervention for the Poisson integer-valued GARCH process both for a known time and an unknown time of intervention. (Chen and Lee (2016)) target a class of zero-inflated generalized Poisson (GP) integer-valued GARCH models with a structural break. We introduce herein a class of transfer function models that incorporate four different types of intervention on integer-valued time series processes: abrupt start and abrupt decay (additive outlier), abrupt start and gradual decay (transient shift), abrupt start and permanent effect (level shift) and gradual start and permanent effect (see Chapter 14 in Wei, [2006], for the ARIMA process). Our study includes intervention effects from the integer-valued GARCH models of (Fokianos and Fried (2010)) and (Liboschik et al. (2014)). The novelty of this study's contribution is that we detect the four types of intervention effects by incorporating a ratio of two finite (rational) polynomials in the proposed models, which treat other models as special cases in our models.
The traditional GARCH(1,1) model does not show good performance for frequentist estimations when run over a sample size of 500 or 1,000 (see Karmakar and Roy (2021)). However, Bayesian formulation alleviates this problem as it does not require a huge sample size. A similar context appears for an integer-valued GARCH model. We offer Bayesian methods to detect and estimate those four types of intervention effects for time series of counts with GP or negative binomial (NB) distributions. GP is a mixture of Poisson distribution (Joe and Zhu (2005)) and a very versatile discrete distribution with applications in various areas of study. The advantage of the GP integer-valued GARCH model is that it accommodates the over-dispersion problem (Chen and Lee (2016)). The NB distribution frequently appears in several research studies to describe over-dispersed count data, such as (Zhu (2011)), (Chen and Lee (2017)), (Chen and Khamthong (2020)) and (Chen et al. (2021)), among others. We target to estimate and detect the locations of interventions following GP, log-linear GP and NB integer-valued transfer function models for time series count data in a Bayesian framework.
Many papers in the literature have successfully employed Bayesian techniques to make inferences or to forecast the class of an integer-valued GARCH family; see (Fried et al. (2015)), (Chen and Lee (2016)), (Chen and Lee (2017)), (Chen et al. (2019)), (Chen et al. (2021)), etc. We also employ Bayesian techniques for detection and estimation based on adaptive Markov chain Monte Carlo (MCMC) methods, which have many advantages as follows: (a) Bayesian methods do not rely on the asymptotic property that can be an obstacle when employing frequentist methods in small-sample situations; (b) these methods allow for simultaneous estimation of all unknown parameters and locations of interventions; (c) they enable efficient and flexible handling of complex models; and (d) they properly impose parameter constraints as part of the prior distribution.
We employ deviance information criterion (DIC) by (Spiegelhalter et al. (2002)) and mean squared standardized residual (MSSR) for model comparisons. In the context of Bayesian inference, one may frame hypothesis testing as a special case of model comparison. This study further presents the use of the posterior odds ratio (POR) to test one or multiple interventions at unknown locations, which shows the posterior relevance of the two hypotheses as a single number.
The rest of the article runs as follows. Section 2 introduces the GP, log-linear GP and NB transfer function models. Section 3 presents Bayesian MCMC methods to estimate the parameters and addresses model selection for the specified models. Section 4 performs a simulation study for illustration. In particular, we study misspecification of the distribution and the autoregressive order. Section 5 implements the proposed transfer function models to three categories of crime data in Albury City, New South Wales Australia (NSW). Section 6 provides concluding remarks.
Transfer function models
This section presents the development of the proposed transfer function models. Here, we incorporate the four types of intervention effects to GP, log-linear GP and NB integer-valued transfer function models. A random variable
where
where
The first two intervention effects on conditional intensity are ‘abrupt start abrupt decay’ and ‘abrupt start gradual decay’, which are respectively additive outlier and transient shifts.
We call
The set-ups for Inv-1 and Inv-2 are the same as (Liboschik et al. (2014)) where the intervention effects are not propagated via the feedback mechanism of the conditional intensity, but only via the contaminated observations. Let
Inv-1 corresponds to the additive outlier in conditional intensity at time
Let
(Inv-2A): For Inv-2A, κ
t
can be expressed as follows:
(Inv-2B): For Inv-2B, κ
t
can be derived as follows:
Note that
We can add another type of intervention at a conditional intensity, known as level shift and gradual start with permanent effect based on a step function,
where the definitions of
(Inv-3): Abrupt start permanent effect: (s, q) = (0, 0).
(Inv-4): Gradual start permanent effect: (s, q) = (0, 1).
Let
where the impact displays a gradual start and a permanent effect. When this intervention occurs in the conditional intensity, we have:
(Inv-4A): For Inv-4A, κ t can be obtained as follows:
(Inv-4B): For Inv-4B, κ t can be derived as follows:
The impact from both Inv-4A and Inv-4B displays a gradual start and a permanent effect, but at different speeds. The rate of level shift from Inv-4A is faster than that from Inv-4B. Figure 1 illustrates both intervention effects. One can clearly see the impact from Inv-4A in Figure 1 with an abrupt increase and permanent effect starting at
Simulated data under the GP integer-valued transfer function with
and
for Inv-2A;
for Inv-2B;
for Inv-4A; and
for Inv-4B
We now turn to the log-linear GP integer-valued transfer function models with a general form of intervention.
We alternatively have the following:
where
(Inv-2A):Abrupt start abrupt decay (additive outlier) at ν T : (s, q) = (0, 0) and I t = P t .
(Inv-2B): Abrupt start gradual decay (transient shift): (s, q) = (0, 1) and I t = P t .
(Inv-4A): Gradual start permanent effect: (s, q) = (0, 0) and I t = S t .
(Inv-4B): Gradual start permanent effect: (s, q) = (0, 1) and I t = S t .
To ensure stability conditions in the original process, we restrict the following conditions from (Fokianos and Tjøstheim (2011)).
(Chen et al. (2019)) employ the condition in (2.13), because there is stronger dependence under it as discussed by (Fokianos and Tjøstheim (2011)).
The NB distribution has become popular in research studies due to its flexibility. We also adopt a NB integer-valued transfer function model herein. We assume that
We alternatively have the following:
where
(Inv-2A):Abrupt start abrupt decay (additive outlier) at κ T : (s, q) = (0, 0) and I t = P t .
(Inv-2B): Abrupt start gradual decay (transient shift): (s, q) = (0, 1) and I t = P t .
(Inv-4A): Gradual start permanent effect: (s, q) = (0, 0) and I t = S t .
(Inv-4B): Gradual start permanent effect: (s, q) = (0, 1) and I t = S t .
(Zhu (2011)) states the stationary condition for a NB integer-valued GARCH model as:
We incorporate this condition into our prior.
Bayesian methods generally require the specification of a prior distribution
Let
where
We make use of the following parameter groups for both GP and log-linear GP integer-valued transfer function models: (a)
GP integer-valued transfer function: p
Log-linear GP integer-valued transfer function: p
NB integer-valued transfer function: p
We now discuss the prior set-up for the second group,
When the location of
For a multiple transfer function model, without loss of generality, we consider two interventions in the integer-valued transfer function model. We assume the prior of (
where
By the Bayes theorem, the conditional posterior distribution is proportional to the likelihood function multiplied by the prior density for each group. The conditional posterior distribution for parameter group
where
Set up initial values of
When
Accept
where
When
Update
where
When
When
Sample
Sample
Go back to Step 2.
Note that we choose
Model selection and diagnostic checking both play crucial roles in empirical applications. To select the best-fitted model among all competing models, we employ DIC, POR and MSSR. First, we express DIC as follows.
where
Bayesian hypothesis testing can represent a special case of model comparison where a model refers to a likelihood function and a prior distribution. We offer a comparison of
where
where
We favour the model with the smallest DIC value. To check the adequacy of the model, we calculate the standardized Pearson residuals proposed by (Jung et al. (2006)):
If the model is correctly specified, then these residuals should have mean zero and variance one with no significant serial correlation in
where the last equation is a simulation-based estimate, and
We conduct a simulation study to examine the effectiveness of MCMC procedures for the following scenarios: (a) integer-valued transfer functions with GP or NB distributions; (b) one or multiple interventions with known or unknown locations; (c) the four types of interventions based on Inv-2A, Inv-2B, Inv-4A and Inv-4B; and (d) model misspecification.
Simulation results for integer-valued transfer function models with known location obtained from 200 replications
Simulation results for integer-valued transfer function models with known location obtained from 200 replications
We organize simulation studies by using a sample of size
where D denotes a distribution, and
Simulation results for multiple interventions obtained from 200 replications
The two interventions, P1t and P2t in Eq. (4.1), are known locations and are given as follows:
For different values of
The results show a successful estimation for single and multiple intervention models (Tables 1 and 2). The average posterior means (medians) are overall satisfactorily close to the true values of the parameters, and all true values are within the 95% credible intervals, indicating that the MCMC method is reasonably effective.
The estimate of
Simulation results of the mis-specified models obtained from 200 replications
We treat the location of
Simulation results for integer-valued transfer function models with unknown location of T obtained from 200 replications Intervention GP Integer-Valued Transfer Function
Simulation results for integer-valued transfer function models with unknown location of T obtained from 200 replications Intervention GP Integer-Valued Transfer Function
Simulation results for integer-valued transfer function models with unknown location of T obtained from 200 replications
Simulation study of multiple interventions for unknown locations of T1 and T2
Simulation results of the misspecified distribution with unknown location of T obtained from 200
When we fit an integer-valued transfer function model to a set of data, the true order of the underlying process is often unknown; hence, it is likely to be misspecified. Table 8 now scrutinizes the effects on estimation in misspecified autoregressive order for the following model.
Simulation results of the misspecified autoregressive order with unknown location of T obtained from 200 replications
where
Time plots of Data 1, Data 2 and Data 3 from January 1995 to March 2020
To demonstrate the proposed methodology, we use monthly crime counts of Albury City in NSW Australia from the NSW Bureau of Crime Statistics and Research (
Descriptive statistics for data examples
Descriptive statistics for data examples
Data 2 lists monthly counts of sexual offenses, and we specify that the intervention starts at
Table 10 presents results of model selection for Data 1 and 2 when the locations of interventions are known. Results show that the NB integer-valued transfer function model (Inv-1) and NB integer-valued transfer function model (Inv-2A) are the favoured models for Data 1 and 2, respectively, in terms of the lowest DIC. Bayesian estimation for Data 1 and 2 when time
MCMC results for Data 1–2 for the known location of intervention
MCMC estimation for the candidate models with unknown location of intervention
Model comparison for the unknown location of intervention
Concerning Data 3, we suspect there are multiple abrupt start and abrupt decay outliers that are sparsely spread throughout the data. We further consider multiple interventions,
We provide parameter estimation and model comparisons for the candidate models in Tables 13 and 14 for Data 3. We detect that the locations of interventions occur at
Bayesian estimates of the multiple intervention model for Data 3
Model comparisons for Data 3: The candidate models with unknown locations of interventions
Diagnostic checking for standardized residuals. Upper panel: Data 1 is based on the NB integer-valued transfer function (Inv-1); lower panel: Data 2 is based on the NB integer-valued transfer function (Inv-2A)
Data 3: The estimated locations of two interventions and diagnostic checking for standardized residuals based on the GP integer-valued transfer function (Inv-4B) and (Inv-2A)
We further examine the residuals based on the lowest DIC value as the best fitted model for each dataset. Figures 3 and 4 present the time plots of the standardized residuals as well as the sample ACFs of the residuals based on the best fitted models. All of the standardized residuals are uncorrelated. On the basis of the diagnostic checking plots, we conclude that the proposed models are adequate. Figure 4 also demonstrates the estimated locations of two interventions for Data 3. It is interesting to note that the average of counts after December 2004 is double than before. Our analysis indicates a gradual start permanent effect beginning at
This research sets up a transfer function model for time series of counts, specifically focusing on the four types of intervention effects. We herein propose a Bayesian MCMC method based on GP, log-linear GP and NB integer-valued transfer function models to estimate and detect these intervention effects. The adaptive MCMC algorithms give reliable and accurate estimates for all unknown parameters both for known and unknown locations of interventions. The simulation results also indicate that the estimate of location is not sensitive to distribution misspecification. We apply the proposed method to crime datasets and select the favoured models by using three model selection criteria (DIC, MSSR and POR). The empirical outcome also reveals that the proposed models are able to successfully detect the locations and type of interventions. Regarding the GP or NB integer-valued transfer function model, each has its own merits. As a final remark, one can obtain a one-step-ahead prediction,
Footnotes
Acknowledgements
We thank the editor, the associate editor and the anonymous referee for their valuable time and constructive comments on our article, which have led to an improved version of it.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: Cathy W. S. Chen's research is supported by the Ministry of Science and Technology, Taiwan (MOST109-2118-M-035-005-MY3). Aljo Clair Pingal would like to acknowledge Mindanao State University Iligan Institute of Technology for the faculty development programme scholarship.
