Sage Journals: Discover world-class research

Abstract

The soft-clipping binomial INGARCH (scBINGARCH) models are proposed as time series models for bounded counts, which have a nearly linear structure and also allow for negative autocor-relations. Conditions that guarantee the existence and certain mixing properties of the scBINGARCH process are derived, and further stochastic properties are discussed. The consistency and asymptotic nor-mality of maximum likelihood estimators are established, and finite-sample properties are studied with simulations. The practical relevance of the scBINGARCH model’s ability to allow for negative parameter and ACF values is demonstrated by some real-data examples.

Keywords

binomial INGARCH models count time series ML estimation mollified uniform distribu-tion negative autocorrelation soft clipping function

1 Introduction

Count time series have attracted a lot of interest in research and practice during the last decades, see Weiß (2018) for a survey. These quantitative time series have a range consisting of either the full set of non-negative integers, $ℕ_{0} = {0,1, \dots}$ (unbounded counts), or a finite subset thereof, ${0, \dots, n}$ with some $n \in ℕ = {1,2, \dots}$ (bounded counts). Many count time series models are inspired by the traditional autoregressive moving-average (ARMA) models for real-valued time series, that is, they are defined such that the upcoming observation depends linearly (in some sense) on past observations and possibly further information. This might be achieved by defining ARMA-like recursions using so-called ‘thinning operations’, or by utilizing a regression approach to ensure a linear conditional mean. The latter models are often referred to as the integer-valued generalized autoregressive conditional heteroskedasticity (INGARCH) models, although they are, in fact, closely related to the ARMA models, also see the discussion on p. 74 in Weiß (2018). The cutting feature of such linear count time series models is an ARMA-like autocorrelation structure, that is, their autocorrelation function (ACF) satisfies a set of Yule–Walker equations. As a consequence, moment properties are easily expressed by closed-form formulae. But unlike the original ARMA models, it is problematic (if not even impossible) to allow for negative ACF values, which, in turn, is caused by parameter constraints that are necessary to ensure the non-negative outcomes (counts) for the data-generating process. More precisely, for unbounded counts, an exactly linear model with negative AR parameters is not possible as past observations might be arbitrarily large such that we would end up with a negative conditional mean, which is not allowed for a count random variable (r. v.). For bounded counts with their additional reflecting barrier at $n$ , by contrast, negative ACF values can be achieved but only under rather restrictive conditions. Examples are the binomial AR $(1)$ (BinAR $(1)$ ) model (McKenzie, 1985) and the binomial INARCH $(1)$ (BINARCH $(1)$ ) model (Weiß & Pollett, 2014; Ristić et al., 2016), which are discussed in more detail later in Section 3. If one does not insist on conditional linearity, then one can certainly get rid of such restrictions on parameter and ACF values. An example is given by the binomial logit-ARCH model (Chen et al., 2020), which uses a logit link to ensure the boundedness of the range; also see Chen et al. (2022) for a more general discussion. But then, parameter values are more difficult to interpret, and closed formulae for the ACF and further moments are not available. Further recent articles on bounded INGARCH models are Liu et al. (2022a, 2022b).

To resolve the ‘linearity versus negative ACF’ dilemma for the case of unbounded counts, Weiß et al. (2022) recently proposed the so-called ‘softplus INGARCH model’ to enable both a nearly linear structure and negative ACF (approximately satisfying the Yule–Walker equations), which is achieved by using the softplus link for model definition (see Section 2 for further details). However, the softplus approach cannot be applied to bounded counts as the softplus function is not bounded from above. Therefore, in this article, we develop a novel model family for time series of bounded counts, referred to as the soft-clipping INGARCH models, which are nearly linear and allow for negative ACF values at the same time. In Section 2, we motivate our approach and derive its relation to the soft-clipping function as well as to a type of mollified uniform distribution (also see Appendix A). The soft-clipping function has been discussed previously in the context of neural network activation functions (Klimek & Perelstein, 2020). In particular, we derive conditions that guarantee the existence and certain mixing properties of the soft-clipping binomial INGARCH process (scBINGARCH). Then, we focus on important special cases. In Section 3, the scBINARCH $(1)$ model, which constitutes a rather well-behaved finite Markov chain allowing for likelihood inference, is compared to its exactly linear competitors, the aforementioned BinAR $(1)$ and BINARCH $(1)$ models, which assume an exactly linear relationship between $X_{t - 1}$ and $E [X_{t} | X_{t - 1}]$ . Afterwards in Section 4, we show how to extend our findings to the higher-order Markovian scBINARCH $(p)$ models, which allow to model a wide range of autocorrelation structures. In Section 5, in turn, we consider the scBINGARCH $(1,1)$ model, which includes a feedback term and, thus, allows to capture some kind of ‘long memory’. The practical relevance of the scBINGARCH’s ability to allow for negative parameter and ACF values is demonstrated in Section 6 with some real-data examples. Finally, Section 7 concludes and outlines issues for future research.

2 The Soft-clipping approach for bounded counts

2.1 Motivation

To motivate our proposed solution for time series consisting of bounded counts, let us start with a look at INGARCH-type models for processes $(X_{t})$ of unbounded counts. Their basic idea is to define the conditional mean at time $t$ , $M_{t} = E [X_{t} | F_{t - 1}]$ with $F_{t - 1}$ being the $σ$ -field generated by ${(X_{t - 1}, M_{t - 1}), (X_{t - 2}, M_{t - 2}), \dots}$ , by the recursive scheme

M_{t} = f (a_{0} + \sum_{i = 1}^{p} a_{i} X_{t - i} + \sum_{j = 1}^{q} b_{j} M_{t - j}) .

(2.1)

Then, the upcoming count value $X_{t}$ is generated according to some count distribution having the mean $M_{t}$ , for example, the Poisson distribution $P o i (M_{t})$ . Choosing the response function $f$ as the identity leads to the ordinary and exactly linear INGARCH $(p, q)$ model of Ferland et al. (2006), but then, the constraints $a_{0} > 0$ and $a_{1}, \dots, a_{p}, b_{1}, \dots, b_{q} \geq 0$ are required to ensure that $M_{t} > 0$ . An intuitive way to overcome these parameter constraints is to choose $f$ in a way that it ensures the correct range of $M_{t}$ for any parameter values. One might choose $f (x) = \exp (x)$ (log-linear model), but this leads to a distinctly non-linear structure. Another idea would be to choose $f$ as the rectified linear unit function, $R e L U (x) = \max {0, x}$ , but then $M_{t} = 0$ might happen, which leads to a degenerate count distribution. Therefore, Weiß et al. (2022) proposed to define $f$ as the softplus function $s p_{c} (x) = c \ln (1 + \exp (x / c))$ with adjustment parameter $c > 0$ (see Mei & Eisner, 2017), which approaches $R e L U (x)$ for $c \to 0$ , also see Figure 1(a).

Figure 1

Plots of different (a) softplus functions $s_{c} (x)$ , (b) reversed softplus functions $n - s_{c} (n - x)$ with $n = 3$ and (c) rescaled soft clipping functions $n s c_{\frac{c}{n}} (\frac{x}{n})$ with $n = 3$

Let us now turn to the case of bounded counts $(X_{t})$ with upper limit $n \in ℕ$ . INGARCH-type models for bounded counts (Weiß & Pollett, 2014; Ristić et al., 2016; Chen et al., 2020) can be defined by the recursive scheme

M_{t} = f (n α_{0} + \sum_{i = 1}^{p} α_{i} X_{t - i} + \sum_{j = 1}^{q} β_{j} M_{t - j}) .

(2.2)

Because of the bounded range, one may also use the following equivalent characterization based on the normalized conditional mean, $P_{t} = \frac{1}{n} M_{t}$ , together with $f^{*} (x) = \frac{1}{n} f (n \cdot x)$ :

P_{t} = f^{*} (α_{0} + \sum_{i = 1}^{p} α_{i} X_{t - i} / n + \sum_{j = 1}^{q} β_{j} P_{t - j}) .

(2.3)

Then, the counts $X_{t} | F_{t - 1}$ are emitted by, for example, the conditional binomial distribution $B i n (n, P_{t})$ ; further options for the conditional distribution of $X_{t} | F_{t - 1}$ are briefly discussed in Section 7. Using the identity as the response function $f$ (Weiß & Pollett, 2014; Ristić et al., 2016) requires strict parameter constraints, whereas the choice of the inverse logit function (Chen et al., 2020) leads to a distinctly non-linear model. It is also clear that neither $R e L U (x)$ nor $s p_{c} (x)$ could control an additional upper bound $n \in ℕ$ , whereas the function $n - s p_{c} (n - x)$ , which approaches $\min {n, x}$ for $c \to 0$ (also see Figure 1(b)), would be unbounded from below. Defining $f$ as the clipped ReLU function $c R e L U_{n} (x) = \min {n, \max {0, x}}$ (Cai et al., 2017) causes the problem of a possibly degenerate count distribution with all probability mass in either $0$ or $n$ , also recall the above discussion of $R e L U (x)$ . Furthermore, $c R e L U_{n} (x)$ is not continuously differentiable in $0$ and $n$ . Thus, the idea is to combine $s p_{c} (x)$ and $n - s p_{c} (n - x)$ in such a way that we end up with a smoothed type of $c R e L U_{n} (x)$ . Here, it is crucial to note the decomposition $c R e L U_{n} (x) = \max {0, x} - x + \min {n, x}$ , which is easily verified. Inspired by this, we choose the response function $f (x)$ in (2.2) as

s p_{c} (x) - x + n - s p_{c} (n - x) = n - x + c \ln (\frac{1 + \exp (\frac{x}{c})}{1 + \exp (\frac{n - x}{c})}) = c \ln (\frac{1 + \exp (\frac{x}{c})}{1 + \exp (\frac{x - n}{c})}) .

f equals a re-scaled type of the soft clipping function (Klimek & Perelstein, 2020), that is, of

s c_{c} (x) = c \ln (\frac{1 + \exp (\frac{x}{c})}{1 + \exp (\frac{x - 1}{c})}),

namely $f (x) = n s c_{c / n} (x / n)$ as plotted in Figure 1(c). Properties of the soft clipping function (A.1) are discussed in Appendix A, where we also show the relation to a type of mollified uniform distribution (in analogy to the relation between the logit link and the logistic distribution). To sum up, we refer to $(X_{t})$ following (2.2) as a soft-clipping INGARCH model if the response function $f$ is equal to

f (x) = n s c_{\frac{c}{n}} (\frac{x}{n}) = c \ln (\frac{1 + \exp (\frac{x}{c})}{1 + \exp (\frac{x - n}{c})}) .

(2.4)

Equivalently, the soft-clipping INGARCH model can be defined based on the normalized conditional mean in (2.3), by using

f^{*} (x) = s c_{c^{*}} (x) = c^{*} \ln (\frac{1 + \exp (\frac{x}{c^{*}})}{1 + \exp (\frac{x - 1}{c^{*}})}) with c^{*} = \frac{c}{n} .

(2.5)

For a numerically more stable implementation of (2.4) and (2.5), we use (A.4).

Adapting the properties of $s c_{c} (x)$ in Appendix A, we know that $f (x)$ in (2.4) takes values in $(0, n)$ , it approaches $c R e L U_{n} (x)$ for $c \to 0$ , also see Figure 1(c), and it is differentiable in $x$ up to any order. The maximal deviation of (2.4) to $c R e L U_{n} (x)$ is in $x = 0$ and $x = n$ (of opposite sign), and it is of absolute size $c \ln (2 / (1 + \exp (- \frac{n}{c})))$ . Analogous results hold for the normalized version of $f$ , that is, for $f^{*} : ℝ \to (0,1)$ in (2.5), which approaches $c R e L U$ for $c \to 0$ , with maximal deviation to $c R e L U (x)$ in $x = 0$ and $x = 1$ .

2.2 The soft-clipping binomial INGARCH model

Let $(X_{t})$ with range ${0, \dots, n}$ be defined by (2.3) and (2.5). If the counts $X_{t} | P_{t}$ are emitted by the binomial distribution $B i n (n, P_{t})$ , then we abbreviate the model as ‘soft-clipping BINGARCH’ (scBINGARCH), with ‘B’ like binomial. So, the scBINGARCH $(p, q)$ model is defined by

X_{t} | F_{t - 1} \sim B i n (n, P_{t}) with P_{t} = s c_{c} (α_{0} + \sum_{i = 1}^{p} α_{i} X_{t - i} / n + \sum_{j = 1}^{q} β_{j} P_{t - j}) .

(2.6)

While the model would be well-defined also without further restrictions, we assume that $| α_{i} |, | β_{j} | < 1$ for $i = 1, \dots, p$ and $j = 1, \dots, q$ to prevent a degenerate behaviour. In (2.6), for the sake of readability, we simply write $c$ instead of $c^{*}$ for the adjustment parameter. While $α_{1}, \dots, β_{q}$ play the role of dependence parameters, $α_{0}$ can be understood as the (normalized-)mean parameter, see the discussion in the last paragraph of Section 2.2. Thus, to ensure the interpretability of $α_{0}$ , a further constraint such as $α_{0} \in (0,1 + p + q)$ appears to be reasonable. Note that $s c_{c} (1 - x) = 1 - s c_{c} (x)$ because of the point symmetry in $(\frac{1}{2}, \frac{1}{2})$ (Appendix A), so the ‘mirrored version’ of model (2.6), that is, where $Y_{t} = n - X_{t}$ and $Q_{t} = 1 - P_{t}$ , satisfies $Y_{t} \sim B i n (n, Q_{t})$ with the same dependence parameters as in (2.6) and a modified intercept term.

Lemma 1. With the aforementioned parameter constraints, and with $c > 0$ , it follows that there exist $ϵ_{1}, ϵ_{2} > 0$ such that

P_{t} \in [ϵ_{1},1 - ϵ_{2}] forall t .

We have $0 \leq X_{t} / n \leq 1$ , and for a cReLU-INGARCH model, it would also be possible that $P_{t}$ reaches the bounds $0$ and $1$ of this range. However, Lemma 2.2 states that for a scBINGARCH model with positive $c$ , the conditional normalized mean $P_{t}$ is truly bounded away from $0$ to $1$ . The proof of Lemma 2.2 is simple: considering the extreme parameter scenarios $α_{1}, \dots, β_{q} \in {- 1,1}$ and the extreme cases $X_{s} / n, P_{s} \in {0,1}$ for $s < t$ , and using that $s c_{c} (x)$ is strictly monotone increasing in $x$ , it follows that $P_{t} > s c_{c} (- p - q) > 0$ and $P_{t} < s c_{c} (1 + 2 p + 2 q) < 1$ . Note that these bounds are only established to obtain theoretical results, but they do not imply any relevant restrictions for practice. For example, for $c = 0.01$ and $p + q = 1$ , we have the lower bound $s c_{0.01} (- 1) \approx 3.7 \cdot 10^{- 46}$ , and for $c = 0.1$ still $s c_{0.1} (- 1) \approx 4.5 \cdot 10^{- 6}$ , which is a negligible deviation from zero.

As another preliminary step towards proving the existence of the scBINGARCH process defined by (2.6), we present the following auxiliary result about the total variation (TV) distance between two binomial distributions.

Lemma 2. Let $n \in ℕ$ and $p_{1} \geq p_{2} \geq n / δ > 0$ for some $δ > 0$ . Then,

T V (B i n (n, p_{1}), B i n (n, p_{2})) \leq 1 - {(1 - | p_{1} - p_{2} |)}^{n} \leq 1 - {(\frac{p_{2}}{p_{1}})}^{n} \leq 1 - \exp (- δ | p_{1} - p_{2} |) .

The proof of Lemma 2 is provided in Appendix B.1. Note that $n / δ$ just denotes a truly positive lower bound for $p_{2}$ . For (2.6), the existence of such a lower bound is ensured by Lemma 1.

Now we proceed in analogy to Weiß et al. (2022) and derive a theorem stating the existence and uniqueness of a stationary distribution of (2.6), as well as its absolute regularity (see Bradley, 2005 for a survey on strong-mixing concepts), provided that some regularity conditions are satisfied. For this purpose, we show that conditions (A1)–(A3) in Doukhan & Neumann (2019) hold such that their Corollary 2.1 as well as Theorems 2.1 and 2.2 are applicable. Our results are summarized in Theorem 1 as follows.

Theorem 1. If the scBINGARCH $(p, q)$ process (2.6) satisfies $\sum_{j = 1}^{q} | β_{j} | < 1$ and $\sum_{i = 1}^{p} \max {0, α_{i}} + \sum_{j = 1}^{q} \max {0, β_{j}} < 1$ , then the following assertions hold:

The Markov process $(Z_{t})$ defined by $Z_{t} = (X_{t}, \dots, X_{t - p + 1}, P_{t}, \dots, P_{t - q + 1})$ possesses a unique stationary distribution;

A stationary version of $(X_{t})$ is absolutely regular with $β$ -mixing coefficients bounded by $C γ^{\sqrt{k}}$ for some constant $C > 0$ , some $γ \in (0,1)$ , and time lag $k \in ℕ$ ;

A stationary version of $(X_{t}, P_{t})$ is ergodic.

The proof of Theorem 1 is provided in Appendix B.2. As in Weiß et al. (2022), let us point out that $\sum_{i = 1}^{p} | α_{i} | + \sum_{j = 1}^{q} | β_{j} | < 1$ would be a sufficient (but not necessary) condition for Theorem 1 to hold.

Before discussing important special cases of the scBINGARCH family in more detail, let us present some further general results. By definition (2.6), the conditional mean and variance are given by $E [X_{t} | F_{t - 1}] = n P_{t}$ and $V [X_{t} | F_{t - 1}] = n P_{t} (1 - P_{t})$ , respectively. Here, it is interesting to compare with the ordinary BINGARCH model, which uses a linear link. Let $P_{t}^{*}$ denote the normalized mean of such a truly linear model, that is, $P_{t} = f^{*} (P_{t}^{*})$ . Then, it holds that $V [X_{t} | F_{t - 1}] > n P_{t}^{*} (1 - P_{t}^{*})$ , that is, the conditional variance of the scBINGARCH model is larger than if using a linear link. This can be seen by considering that $x (1 - x)$ is maximized at 0.5, and that the soft-clipping function is mapping each value towards 0.5: because of the curvature of $s c_{c} (x)$ , we have $| s c_{c} (x) - 0.5 | < | x - 0.5 |$ .

There is no explicit formula for the unconditional mean $E [X_{t}]$ , but our computations show that it is generally close to the value obtained by plugging in the parameters into the formula for the unconditional mean of the exactly linear BINGARCH model, that is, $E [X_{t}] \approx n α_{0} / (1 - \sum_{i = 1}^{p} α_{i} - \sum_{j = 1}^{q} β_{j})$ (RistiÀ et al. 2016, Theorem 8). This approximation can also be justified as follows. By Taylor’s formula, we have $E [X_{t}] = n E [s c_{c} (P_{t}^{*})] = n \sum_{k = 0}^{\infty} s c_{c}^{(k)} (0.5) / k! \cdot E [(P_{t}^{*} - {0.5)}^{k}]$ . Note that $s c_{c}' (0.5) = \tanh (1 / (4 c)) \approx 1$ for small $c$ , while $s c_{c} (0.5) = 0.5$ and $s c_{c}'^{'} (0.5) = 0$ . Thus, a second-order Taylor approximation yields $E [X_{t}] \approx n E [s c_{c} (0.5) + s c_{c}' (0.5) (P_{t}^{*} - 0.5) + 0] \approx n E [P_{t}^{*}]$ , as in a truly linear model. Analogously, the Yule–Walker equations in Theorem 9 and Example 2 of RistiÀ et al. (2016) are used to approximate the autocorrelation properties of the scBINGARCH process, irrespective of any parameter constraints.

3 The soft-clipping BINARCH(1) model

As the first important special case of the scBINGARCH family (2.6), we discuss the scBINARCH $(1)$ model (that is, $p = 1$ and $q = 0$ ), the model recursion of which can be summarized as

X_{t} | F_{t - 1} \sim B i n (n, P_{t}) with P_{t} = s c_{c} (α_{0} + α_{1} X_{t - 1} / n) .

(3.1)

Since $P_{t} \in (0,1)$ is guaranteed if $c > 0$ (also recall Lemma 1), the transition probabilities of this finite Markov chain are always truly positive, that is, it is primitive and thus ergodic with a unique stationary solution. In addition, it is also $ϕ$ -mixing with geometrically decreasing weights (Weiß, 2018), which strengthens Theorem 1. Obviously, $α_{1} = 0$ leads to independent and identically distributed (i. i. d.) binomial counts, so we have a true Markov model only if $α_{1} = 0$ . But otherwise, no parameter restrictions are required. Note that the above conclusions would also hold if taking the beta-binomial (zero-inflated binomial) distribution for emitting the counts, that is, with an additional dispersion (zero) parameter, because this distribution has a truly positive probability mass function on whole ${0, \dots, n}$ as well.

3.1 Model properties

We shall now compare the properties of this nearly linear model with two well-established exactly linear finite Markov chains: the ordinary BINARCH $(1)$ model (Weiß & Pollett, 2014) defined by using the identity link instead of the soft-clipping one,

X_{t} | F_{t - 1} \sim B i n (n, a_{0} + a_{1} X_{t - 1} / n) withconstraints a_{0}, a_{0} + a_{1} \in (0,1),

(3.2)

and the BinAR $(1)$ model by McKenzie (1985). The latter uses the binomial thinning operation ‘ $\circ$ ’ (Steutel & van Harn, 1979) for model definition, given by $α \circ X | X \sim B i n (X, α)$ for $α \in (0,1)$ . Let $π \in (0,1)$ and $ρ \in (\max {\frac{- π}{1 - π}, \frac{1 - π}{- π}}, 1)$ , denote $β : = π (1 - ρ)$ and $α : = β + ρ$ . Then, the BinAR $(1)$ model is defined by

X_{t} = α \circ X_{t - 1} + β \circ (n - X_{t - 1}) .

(3.3)

Setting $ρ = a_{1}$ and $π = a_{0} / (1 - a_{1})$ , the BinAR $(1)$ and BINARCH $(1)$ model have the same mean and ACF, namely $E [X_{t}] = n π$ and $A C F (k) = ρ^{k}$ at lag $k$ , and also their parameter constraints agree (thus, they have a common grey region in Figure 2). But they differ in their variance: while the BinAR $(1)$ model has the binomial variance $V [X_{t}] = n π (1 - π)$ , the BINARCH $(1)$ model exhibits extra-binomial variation, namely $V [X_{t}] = n π (1 - π) / (1 - (1 - \frac{1}{n}) ρ^{2})$ , see Weiß & Pollett (2014). Equivalently, the binomial index of dispersion (BID), defined as $n V [X_{t}] / (E [X_{t}] (n - E [X_{t}]))$ , equals 1 for the BinAR $(1)$ model, and $1 / (1 - (1 - \frac{1}{n}) ρ^{2})$ for the BINARCH $(1)$ model. Note that $A C F (k)$ might be negative at odd lags, because $ρ$ might be negative. However, the maximal extend of negativity depends on the actual normalized mean, $\frac{1}{n} E [X_{t}] = π$ , see the grey regions in Figure 2. The black regions in Figure 2 are the corresponding moment properties of the scBINARCH $(1)$ model, which have been computed numerically exactly by utilizing the scBINARCH $(1)$ model’s Markov property: solving the invariance equation corresponding to scBINARCH $(1)$ ’s transition matrix, we get the stationary marginal distribution and, thus, arbitrary lag- $k$ distributions, from which we compute the discrete moments by simple summation. The black region consists of dots computed for a grid of parameter values for $(α_{0}, α_{1})$ .

Figure 2

Plots of attainable pairs of $A C F (1)$ against $\frac{1}{n} E [X_{t}]$ for different upper bounds $n$ , where black region corresponds to scBINARCH $(1)$ model with $c = 0.01$ , and grey region to BinAR $(1)$ model (or equivalently, BINARCH $(1)$ model)

Before looking at the scBINARCH $(1)$ model’s potential for negative $A C F (1)$ values, a more general discussion is necessary. In contrast to, for example, the Gaussian AR $(1)$ model, where $A C F (1)$ can attain any value within $(- 1,1)$ independent of the actual mean, the possibility for negative $A C F (1)$ values is usually limited for discrete-valued processes (also see Lin et al., 2014, Section 2). For $n = 1$ , all three models agree with a (re-parametrized) general binary Markov chain, that is, the grey region in Figure 2 cannot be exceeded in that case (for negative ACF values of more general binary time series, also see Jentsch & Reichmann (2019)). For $n > 1$ , the BinAR $(1)$ and BINARCH $(1)$ model are still not able to exceed this grey region, whereas the scBINARCH $(1)$ model reaches more and more pairs $(\frac{1}{n} E [X_{t}], A C F (1))$ with increasing $n$ , see the black regions in Figure 2. So this model is much more flexible with respect to negative ACF values.

The next question to be analyzed is the ‘extent of linearity’ that is achieved by the scBINARCH $(1)$ model. For this purpose, for diverse model parameterizations, we computed the values of the normalized mean, the BID, and the partial ACF (PACF) at lags 1–3. These are compared with the corresponding ‘linear values’, that is, the values obtained by plugging in the parameters into the above formulae of the BINARCH $(1)$ model (irrespective of any parameter constraints). Note that the latter model has PACF values equal to 0 for lags $k \geq 2$ . The obtained results are summarized in the tables of Supplement S.1, where we have to compare the ‘sc’ values to the ‘lin’ values. It can be seen that the linearity improves if the normalized mean approaches 0.5, which is reasonable in view of the areas plotted in Figure 2. It is also plausible that the linearity improves with increasing $n$ . But the most important factor is certainly given by the parameter $c$ , recall Figure 1(c), where the values printed for $c = 0$ were computed by using the response function $c R e L U (x)$ instead of $s c_{c} (x)$ . This boundary case is not relevant for applications in practice because of the problems explained in Section 2, but it serves as the ‘best case’ regarding the attainable linearity. Comparing $c > 0$ with $c = 0$ , it can be seen that there are sometimes notable deviations from linearity for $c = 0.1,0.05$ , whereas $c = 0.025$ and especially $c = 0.01$ often lead to nearly identical values as $c = 0$ . So for practice, $c \leq 0.025$ appears to be a reasonable choice to obtain a nearly linear model. Nevertheless, there are a few scenarios where we observe deviations from linearity even for $c = 0$ , namely low $π$ (especially $π = 0.1$ ) and low $n$ in the case of strongly negative dependence parameters. This can be explained from our discussion in Figure 2, where we realized that Markov chains cannot reach ACF values being arbitrarily close to $- 1$ in such scenarios. But except for these extreme cases, the scBINARCH $(1)$ model with low $c$ (such as $c = 0.01$ ) does very well in imitating linearity.

3.2 Likelihood-based statistical inference

Results regarding likelihood-based inference, especially about the asymptotics of the conditional maximum likelihood (CML) estimator of the scBINARCH $(1)$ ’s model parameter vector $θ = (α_{0}, α_{1})$ , follow immediately once Condition 5.1 of Billingsley (1961) has been shown. Let us focus on parameter estimation from a given time series $x_{1}, \dots, x_{T}$ of length $T \in ℕ$ here. The scBINARCH $(1)$ ’s conditional log-likelihood function is given by

\begin{array}{l} l (θ) = \ln L (θ; x_{1}) = \sum_{t = 2}^{T} \ln p_{x_{t} | x_{t - 1}}, where \\ p_{j | i} = (\begin{matrix} n \\ j \end{matrix}) s c_{c} {(α_{0} + α_{1} \frac{i}{n})}^{j} {(1 - s c_{c} (α_{0} + α_{1} \frac{i}{n}))}^{n - j} \end{array}

(3.4)

denotes the transition probabilities $p_{j | i} = P (X_{t} = j | X_{t - 1} = i)$ of this finite Markov chain (which are truly positive for $c > 0$ ). The CML estimate $θ$ of $θ$ is obtained by numerically maximizing $l (θ)$ , where approximate moment estimates (obtained by applying the approximate moment relations of Section 3.1 to the respective sample moments) might be used as initial values. If $c > 0$ , the existence of a consistent CML estimator, $θ \to θ$ almost surely, is ensured, which is also asymptotically normally distributed according to

\sqrt{T} (θ - θ) \overset{d}{\to} N (0, I^{- 1} (θ)),

(3.5)

where $I^{- 1} (\cdot)$ denotes the expected Fisher information. (3.5) follows from Theorems 2.1 and 2.2 of Billingsley (1961), and $I (θ)$ is non-singular according to Billingsley (1961, p. 24. The proof of this assertion is provided in Appendix B.3.

Simulations results (with 10 000 replications per scenario and with $c = 0.01$ ) for checking the finite-sample performance of $θ$ are summarized in Supplement S.2. It can be seen that the CML estimation performs rather well even for such small sample sizes as $T = 50$ . We have a low bias, quickly decreasing standard errors (s. e.), and the approximate s. e. obtained from the Hessian of $l$ agree quite well with the simulated s. e. in the mean.

Remark 1. Note that our main intention is to use the scBINGARCH models as approximations of truly linear models. Thus, we specify the adjustment parameter $c > 0$ in advance, chosen sufficiently small such that approximate linearity holds (such as $c = 0.01$ as recommended in Section 3.1). If one wants to use a scBINGARCH model as a truly non-linear model, with $c ≫ 0$ , then it would be relevant to include $c$ into estimation. This issue is briefly discussed at the end of Appendix 2.3. Our simulation experiments showed that a reasonable finite-sample performance is only achieved if $c$ is sufficiently distant from 0. This is plausible as we recognized in Section 3.1 that models with $c \leq 0.025$ are hard to distinguish in their stochastic properties.

4 Higher-order soft-clipping BINARCH models

According to (2.6), the scBINARCH $(1)$ model discussed in previously in Section 3 is extended to a $p$ th-order Markov model with $p > 1$ by the recursive scheme

X_{t} | F_{t - 1} \sim B i n (n, P_{t}) with P_{t} = s c_{c} (α_{0} + \sum_{i = 1}^{p} α_{i} X_{t - i} / n) .

(4.1)

Model (4.1) constitutes a counterpart to the truly linear BINARCH $(p)$ model of RistiÀ et al. (2016) but with much weaker parameter constraints. In particular, the moment properties in Theorems 1–2 of RistiÀ et al. (2016) hold true in approximation, even if negative parameter values are employed. As in Section 3, results exceeding those of Theorem 1 are derived by utilizing the Markov properties of the scBINARCH $(p)$ model. The idea is to consider the process of vectors $X_{t} = (X_{t}, \dots, X_{t - p + 1})^{⊤}$ , which again constitutes a finite Markov chain. Given $c > 0$ , the corresponding 1-step-ahead transition probabilities $p_{j | i} = P (X_{t} = j | X_{t - 1} = i)$ are non-zero iff $j_{2} = i_{1}$ , ..., $j_{p} = i_{p - 1}$ , namely

p_{j | i} = δ_{j_{2}, i_{1}} \dots δ_{j_{p}, i_{p - 1}} \cdot (\begin{matrix} n \\ j_{1} \end{matrix}) P_{i}^{j_{1}} {(1 - P_{i})}^{n - j_{1}} with P_{i} = s c_{c} (α_{0} + \sum_{k = 1}^{p} α_{k} \frac{i_{k}}{n}) .

(4.2)

Furthermore, the $p$ -step-ahead transition probabilities are truly positive throughout. Thus, we can conclude on the existence of a unique stationary solution (ergodic and $ϕ$ -mixing) as well as on likelihood inference like in Section 3, also see the analogous arguments in RistiÀ et al. (2016).

To get an idea about the abilities of the scBINARCH model for explaining different autocorrelation structures, we created some ACF $(2)$ –ACF $(1)$ plots for the scBINARCH $(2)$ model in Figure 3, in analogy to Figures 6(b) and 7 of Jentsch & Reichmann (2019). More precisely, for the values $n \in {3,10,30}$ , $α_{1}, α_{2} \in {- 0.99, - 0.98, \dots,0.99}$ , $α_{0} = 0.5 (1 - α_{1} - α_{2})$ , and $c = 0.01$ , we computed the values of $A C F (1), A C F (2)$ and plotted them against each other. Note that the ordinary BINARCH $(2)$ model of RistiÀ et al. (2016) only allows for non-negative values of the AR parameters, that is, it only allows to achieve pairs $(A C F (1), A C F (2))$ in the top-right quadrants of Figure 3. So it becomes clear that the novel scBINARCH $(2)$ model is much more flexible w. r. t. the achievable autocorrelation structures, with increasing range of ACF pairs for increasing $n$ in analogy to Figure 2.

Figure 3

Plots of attainable pairs of $A C F (2)$ against $A C F (1)$ for the scBINARCH $(2)$ model, for different upper bounds $n$ and with $c = 0.01$

5 Including a feedback term: scBINGARCH(1,1) model

Within families of INGARCH-type models, the model order $(p, q) = (1,1)$ is a rather common choice in practice, as it constitutes a parsimonious way of providing some kind of ‘long memory’ (in the sense of a very slowly decreasing ACF like in Figure 4), see Fokianos (2011). This is achieved by including the feedback term $M_{t - 1}$ (or $P_{t - 1}$ , respectively) into the model recursion (2.6) for $X_{t} | F_{t - 1}$ ,

X_{t} | F_{t - 1} \sim B i n (n, P_{t}) with P_{t} = s c_{c} (α_{0} + α_{1} X_{t - 1} / n + β_{1} P_{t - 1}) .

(5.1)

Figure 4

Plots of a sample path and ACF $(h)$ of a scBINGARCH $(1,1)$ process with $n = 10$ , $c = 0.01$ , $α_{0} = 0.3 (1 - α_{1} - β_{1})$ , and (a–b) $α_{1} = 0.25$ , $β_{1} = 0.70$ ; (c–d) $α_{1} = - 0.25$ , $β_{1} = - 0.70$

The model order $(1,1)$ considered in (2.14) is also a special case of the beta-binomial GARCH $(1,1)$ model in Chen et al. (2022), as the scBINGARCH $(1,1)$ process satisfies their contraction condition (2.4), see our derivations for (A2) in Appendix B.2. (4.1) results in a dependence of $X_{t}$ on all past observations $X_{t - h}$ , but with (approximately) exponentially decreasing weight for increasing $h$ , see the discussions in Section 3 of Fokianos (2011) or Example 4.1.4 of Weiß (2018). For the illustrative examples shown in Figure 4, a sample path of length $10^{4}$ was simulated (the time series plot refers to the first $T = 250$ observations) and used to compute the sample ACF. Since $| α_{1} + β_{1} | = 0.95$ is close to $1$ in both cases, the ACF converges only slowly towards $0$ .

The existence of a unique stationary solution (ergodic and $β$ -mixing) is ensured by Theorem 1. But it could also be concluded from Propositions 1 and 2 in Davis & Liu (2016), as the conditional binomial distribution belongs to the one-parameter exponential family (and since the soft-clipping function satisfies a contraction condition, recall the proof of Theorem 1 in Appendix B.2). Theorems 1 and 2 in Davis & Liu (2016), in turn, are now used to conclude in the following result about CML estimation.

Theorem 1. If the scBINGARCH $(1,1)$ process (5.1) satisfies the conditions of Theorem 1, then the CML estimator $θ \to θ$ almost surely, and it is asymptotically normally distributed,

\sqrt{T} (θ - θ) \overset{d}{\to} N (0, I^{- 1} (θ)) .

The proof of Theorem 1 is provided in Appendix B.4. Note that Theorems 1 and 2 in Cui & Zheng (2017) provide an extension of Davis & Liu (2016) to general model orders $(p, q)$ . For the model order $(1,1)$ considered in Theorem 1, in turn, also the results in Chen et al. (2022) could have been used for a proof.

The finite-sample performance of $θ$ was again investigated by simulations (with 10 000 replications per scenario and with $c = 0.01$ ), see Supplement S.2. This time, however, in analogy to the findings of Weiß al. (2022), we recognize that clearly larger sample sizes are required for achieving a good estimation performance, say $T \geq 250$ . Then, the performance is better if $| α_{1} + β_{1} |$ is much smaller than 1. For the simulation scenarios with $| α_{1} + β_{1} | = 0.70$ , we have a notable bias even for $T = 250$ , especially for $β_{1}$ . This is plausible as $β_{1}$ controls the length of the process memory. It is also interesting to note that the bias for $β_{1}$ often increases with increasing $n$ , while we usually have a stable bias for $α_{1}$ . Simulated and approximate s. e., by contrast, agree quite well in the mean.

6 Data examples

6.1 Geyser eruption data

As our first data example, we consider the geyser eruption data analyzed by Jentsch & Reichmann (2019). This binary time series (so $n = 1$ ) of length $T = 299$ refers to successive eruptions of the Old Faithful Geyser (Data provided in R’s MASS package, https://cran.r-project.org/package=MASS), where the value 1 (0) was recorded if the eruption duration was at least (less than) three minutes. The geyser eruption data exhibit strong negative autocorrelations with significant sample PACF values for lags $\leq 2$ , see Figure 5. Therefore, Jentsch & Reichmann (2019) fitted the second-order autoregressive gbAR $(2)$ model from their generalized binary ARMA (gbARMA) family to these data and showed the adequacy of this model fit. Jentsch & Reichmann (2019) developed their gbARMA family as an extension of the so-called NDARMA model by Jacobs & Lewis (1983): By adding a switching facility to the NDARMA’s multinomial selection mechanism, they circumvented the NDARMA’s disadvantage of only capturing positive ACF values. In what follows, we demonstrate that our novel scBINARCH $(p)$ model may serve as a ‘more user-friendly’ alternative to the gbAR model $(p)$ , while the scBINGARCH $(p, q)$ model leads to a different ACF than the gbARMA $(p, q)$ model for $q > 1$ (‘long memory’ vs traditional ARMA ACF).

Figure 5

Time series plot and sample PACF of geyser eruption data

Computing CML estimates for the AR parameters $α^{(1)}, α^{(2)}$ and the error term’s mean $μ_{e}$ of the gbAR $(2)$ model, we get ${\hat{α}}^{(1)} \approx - 0.393$ , ${\hat{α}}^{(2)} \approx 0.270$ , and ${\hat{μ}}_{e} \approx 1.000$ , where ${\hat{μ}}_{e}$ actually equals the upper bound used for the box constraints of $μ_{e} \in (0,1)$ . The corresponding value of Akaike’s information criterion (AIC) computes as $\approx 259.8$ . Note that Jentsch & Reichmann (2019) used the AIC for model order selection, with the result $p = 2$ nicely confirming the outcome of the sample PACF in Figure 5. The conditional distribution of our scBINARCH $(p)$ model (4.1) from Section 4 and, thus, numerical likelihood computations appear to be slightly more easy to implement than for the gbAR $(p)$ model. Furthermore, the parameterization with the intercept parameter $α_{0}$ instead of the error term’s mean $μ_{e}$ turns out to be advantageous, as we do not get in conflict with the box constraints. Fitting the model orders $p = 1,2,3$ and using $c = 0.01$ , we get the estimates $({\hat{α}}_{0}, {\hat{α}}_{1}) \approx (1.106, - 0.647)$ with $A I C \approx 272.5$ , $({\hat{α}}_{0}, {\hat{α}}_{1}, {\hat{α}}_{2}) \approx (0.867, - 0.530,0.270)$ with $A I C \approx 259.8$ , and $({\hat{α}}_{0}, {\hat{α}}_{1}, {\hat{α}}_{2}, {\hat{α}}_{3}) \approx (0.884, - 0.511,0.256, - 0.036)$ with $A I C \approx 262.6$ , respectively. Thus, we confirm the choice of the model order $p = 2$ . The identical AIC values (up to the given numerical precision) for the gbAR $(2)$ and scBINARCH $(2)$ models follow from nearly identical stochastic properties of both model fits.

Within the scBINGARCH family, also the scBINGARCH $(1,1)$ model (4.1) from Section 5 is a plausible candidate model, as the sample ACF values $- 0.538,0.478, - 0.346,0.318, - 0.256, \dots$ for lags $h = 1,2, \dots$ are slowly decaying in absolute extent. CML estimation leads to $({\hat{α}}_{0}, {\hat{α}}_{1}, {\hat{β}}_{1}) \approx (1.530, - 0.674, - 0.516)$ with $A I C \approx 259.5$ . While the tiny advantage in terms of AIC should not be overestimated for practical purposes, it is interesting to note that the fitted scBINGARCH $(1,1)$ model has ACF values $- 0.540,0.478, - 0.379,0.307, - 0.246, \dots$ being more close to those of the sample ACF than the ACF values $- 0.551,0.496, - 0.359,0.272, - 0.203, \dots$ of the fitted scBINARCH $(2)$ model. Thus, in summary, the scBINARCH model can be used as an alternative to the gbAR model, as it leads to nearly identical stochastic properties except a reparameterization, while the full scBINGARCH model has the ability to describe a slowly decaying ACF.

6.2 Air quality data

As our second application, let us consider the air quality data (Data provided as online supplementary material for Liu et al., 2022a.) discussed by Liu et al. (2022a, 2022b). For 30 Chinese cities, they analyzed a time series of daily air quality levels (December 2013–July 2019, so length $T = 2 068$ ). Here, air quality is measured on an ordinal scale with levels $s_{0} = `excellen t^{'}$ to $s_{5} = `severelypollute d^{'}$ . But for modeling purposes, Liu et al. (2022a) used a ‘rank-count approach’ as in Weiß (2020), that is, the ordinal r. v. $Y_{t, i}$ at time $t$ in city $i$ is substituted by $s_{X_{t, i}}$ with $X_{t, i}$ being a bounded-counts r. v. with $n = 5$ . Then, they applied a linear INGARCH $(1,1)$ model to these data, where the conditional distribution is a truncated Poisson distribution with additional zero and one inflation. This ‘ZOB Poisson’ distribution was used by Liu et al. (2022a) as the categories $s_{0}$ and $s_{1}$ turned out to be dominant in the data (‘normalcy-dominant’ categorical data). Certainly, the conditional binomial distribution of our scBINGARCH $(1,1)$ model is not able to explain zero and one inflation, so we do not propose it as an alternative model for the full set of time series. But for some of the time series, our scBINGARCH $(1,1)$ model performs well even without the zero-one-inflation feature. Thus, our subsequent discussion shall focus on two such exemplary series. For future research, it is recommended to develop a zero-one-inflated scBINGARCH (scZOBINGARCH) model for dealing with the whole set of air quality time series.

We consider the count time series $(x_{t, i})$ corresponding to Beijing ( $i = 1$ ) and Zhengzhou ( $i = 2$ ) as examples. Plots of these time series as well as their sample ACFs are provided in Figure 6. For the Zhengzhou series $(x_{t,2})$ in (b), we have a slowly decaying ACF as to be expected for INGARCH $(1,1)$ -type data. For the Beijing series $(x_{t,1})$ in (a), by contrast, the ACF decays rather quickly such that the model choice INGARCH $(1,1)$ is not clear in advance. But following Liu et al. (2022a), we fit an scBINGARCH $(1,1)$ model (with $c = 0.01$ ) to both time series.

Figure 6

Time series plot and sample ACF of air quality data for (a–b) Beijing and (c–d) Zhengzhou

For the Beijing series $(x_{t,1})$ , we get the CML estimates $({\hat{α}}_{0}, {\hat{α}}_{1}, {\hat{β}}_{1}) \approx (0.180,0.595, - 0.161)$ , that is, a negative estimate for the feedback parameter $β_{1}$ . Such a negative value is not possible for ZOB-INGARCH $(1,1)$ model of Liu et al. (2022a), as all parameters are restricted to non-negative values. Accordingly, the estimate ‘0.0000’ is reported in Table 3 of Liu et al. (2022a). Thus, it is interesting to compare the ACFs of the fitted models to the sample ACF in Figure 6(b), where the fitted models’ ACF was computed as the sample ACF of a simulated sample path of length $10^{4}$ . We get the ACF values $0.546,0.232,0.096,0.036$ for our scBINGARCH $(1,1)$ fit and $0.519,0.270,0.144,0.075$ for the ZOB-INGARCH $(1,1)$ fit reported by Liu et al. (2022a)], while the sample ACF takes the values $0.539,0.214,0.073,0.022$ for lags 1–4. So we recognize a better agreement for our scBINGARCH $(1,1)$ model, that is, the possibility for negative parameter values is clearly advantageous. Also the ACF values of the scBINGARCH $(1,1)$ ’s standardized Pearson residuals in Figure 7(a) are not significant on a 5%-level, confirming that the serial dependence structure is captured adequately. Furthermore, the model’s conditional distribution for $X_{t} | X_{t - 1}, \dots$ appears adequate, as the residuals’ mean is close to 0, the variance is only slightly larger than 1, and the PIT histogram in Figure 7(b) is close to uniformity; see Weiß (2018, Section 2.4) for details on these diagnostic tools. While the conditional properties of the data are very well described by the fitted scBINGARCH $(1,1)$ model, it is not fully able to mimic the marginal distribution: we have a close agreement between the means, 0.317 versus 0.321 (model vs data) but the BIDs 1.297 versus 1.410 are a bit more distant. More flexibility w. r. t. zero and one inflation could help to better explain the marginal dispersion.

Figure 7

Model diagnostics for scBINGARCH $(1,1)$ fits: sample statistics of Pearson residuals in (a), and PIT histogram for (b) Beijing and (c) Zhengzhou

For the Zhengzhou series $(x_{t,2})$ , by contrast, all CML estimates are positive, namely $({\hat{α}}_{0}, {\hat{α}}_{1}, {\hat{β}}_{1}) \approx (0.146,0.494,0.136)$ . We do not only have a close agreement between the model’s and data’s ACF, namely $0.536,0.324,0.236, \dots$ versus $0.534,0.336,0.207, \dots$ , also the non-significant values of the Pearson residuals’ ACF in Figure 7(a) confirm the adequacy of the modelled dependence structure. Regarding the marginal distribution, we now have an even closer agreement between the means (0.391 vs 0.396) and BIDs (1.287 vs 1.260) than for the Beijing series. The reason might be given by the fact that only few zeros are observed in Figure 6(c), that is, we do not have inflation in both zero and one. This time, however, there are some reservations with respect to the conditional distributions of $X_{t} | X_{t - 1}, \dots$ : while mean and variance of the Pearson residuals in Figure 7(a) are close to their target values 0 and 1, respectively, the PIT histogram exhibits an asymmetric behaviour. So mean and variance of the conditional distribution are adequately described by the scBINGARCH $(1,1)$ fit but not its shape. To sum up, the scBINGARCH model is able to flexibly adapt to diverse types of dependence structure, because it is not subject to unduly stringent parameter restrictions. It is not the perfect choice for the air quality data, as it is not able to explain the zero and one inflation. The air quality data also seem to exhibit some yearly pattern, so the inclusion of covariates into the scBINGARCH model might be helpful in this regard. Thus, such extensions appear to be an interesting direction for future research.

7 Conclusions

The scBINGARCH family for time series of bounded counts was proposed as an extension of the linear BINGARCH model, as it gets by with less severe parameter restrictions. In particular, it behaves like the linear BINGARCH model for positive parameter values, but it also allows for negative parameter and thus ACF values. This was achieved by using the nearly linear soft-clipping function as the response function. It was shown that the moment properties of the scBINGARCH model are generally well approximated by simply using the moment formulae from the exactly linear model; this approximation is less accurate only for extreme parameter scenarios. We established the existence and mixing properties for the scBINGARCH model, as well as the consistency and asymptotic normality of the CML estimator. The finite sample performance of the CML estimator was studied by simulations, and the practical relevance of the scBINGARCH model was demonstrated with real-data examples.

However, the data examples also made clear that further research is needed for the model. In particular, the conditional binomial distribution seems to be too restrictive for some applications. Future research could therefore turn to the development of, for example, a beta-binomial scINGARCH model (that is, with an additional parameter for controlling the extend of extra-binomial variation), also see Chen et al. (2022), or to a zero-inflated (or even zero-one-inflated) scINGARCH model (that is, with additional parameter(s) for controlling the zero (and one) probability), with the latter being motivated by the air quality data discussed in Section 6.2. Also, recall Section 3 regarding the Markov case. The air quality example also gives rise to think of a spatio-temporal extension, as there might be some spatial dependence between the air quality of the 30 cities.

Appendices

Appendix A Soft clipping and mollified uniform distribution

The (unit) clipped ReLU function is defined by clipping the identity function to the interval $[0,1]$ , that is, by $c R e L U (x) = \min {1, \max {0, x}}$ Cai et al. (2017). A smoothed version of it, the soft clipping function, was proposed by Klimek & Perelstein (2020) as

s c_{c} (x) = c \ln (\frac{\exp (\frac{x}{c}) + 1}{\exp (\frac{x - 1}{c}) + 1}) = \frac{1}{2} + c \ln ​ (\cosh (\frac{x}{2 c})) - c \ln ​ (\cosh (\frac{x - 1}{2 c})), c > 0.

(A.1)

Let us derive important properties of the soft clipping function. It satisfies

s c_{c} (x) \in (0,1), \lim_{x \to - \infty} s c_{c} (x) = 0, \lim_{x \to + \infty} s c_{c} (x) = 1,

and it is point symmetric in $(\frac{1}{2}, \frac{1}{2})$ . $s c_{c} (x)$ approaches the clipped ReLU function for $c \to 0$ . It is infinitely differentiable, with the first derivative being given by

s c_{c^{'}} (x) = \underset{\in (0,1)}{\underset{ุ}{\frac{\exp (\frac{x}{c})}{\exp (\frac{x}{c}) + 1}}} \underset{\in (0,1)}{\underset{ุ}{\frac{1 - \exp (- \frac{1}{c})}{1 + \exp (\frac{x - 1}{c})}}} = \frac{\sinh (\frac{1}{2 c})}{2 \cosh (\frac{x}{2 c}) \cosh (\frac{x - 1}{2 c})} \in (0,1) .

(A.2)

Thus, $s c_{c} (x)$ is strictly monotone increasing from 0 to 1, but its increase is weaker than that of $c R e L U (x)$ on $(0,1)$ . Therefore, the maximal deviation between $s c_{c} (x)$ and $c R e L U (x)$ is in $x = 0,1$ , given by $\pm s c_{c} (0)$ . Note that $s c_{c^{'}} (x)$ approaches the unit rectangular function $1_{[0,1]} (x)$ for $c \to 0$ , where the indicator function $1_{A} (x)$ equals 1 (2.0) if $x \in A$ ( $x \in A$ ). Finally, the soft clipping function is related to the softplus function by

s c_{c} (x) = s p_{c} (x) - x + 1 - s p_{c} (1 - x),

(A.3)

in analogy to the decomposition $c R e L U (x) = \max {0, x} - x + \min {1, x}$ . Inserting the equality $s p_{c} (x) = \max {0, x} + c \ln (1 + \exp (- | \frac{x}{c} |))$ Wiemann et al. (2021) into (A.3), we can rewrite the soft clipping function as

s c_{c} (x) = c R e L U (x) + c \ln ​ (1 + \exp (- | \frac{x}{c} |)) - c \ln ​ (1 + \exp (- | \frac{1 - x}{c} |)),

(A.4)

which, together with the log1p function, allows for a numerically stable implementation of $s c_{c} (x)$ .

Figure A1

Soft/mollified uniform distribution with scale parameter $c$ . Plots of (a) CDF and (b) PDF for different $c$ against $x$ . (c) Plots of variance and ‘out of unit interval’-probability against $c$

Besides serving as a response function in regression modeling, the soft clipping function is also related to the uniform distribution. Note that $c R e L U (x)$ just equals the cumulative distribution function (CDF) of the (unit) uniform distribution, $U (0,1)$ , see Chapter 26 in Johnson et al. (1995). Since $s c_{c} (x)$ from (A.1) is itself a CDF, see Figure A1 (a) for some plots, the corresponding distribution might be thought of as a ‘soft uniform distribution’, which has the full set of real numbers, $ℝ$ , as its range but which approaches $U (0,1)$ for $c \to 0$ . Its probability density function (PDF) is then given by (A.2), see Figure A1 (b) for illustration. Its quantile function $s c_{c}^{- 1} : (0,1) \to ℝ$ equals

s c_{c}^{- 1} (y) = c \ln (\frac{\exp (\frac{y}{c}) - 1}{1 - \exp (\frac{y - 1}{c})}) = y + c \ln ​ (1 - \exp (- \frac{y}{c})) - c \ln ​ (1 - \exp (- \frac{1 - y}{c})),

(A.5)

where the latter version can again be implemented using the log1p function. It is easily seen that this distribution is equal to the convolution of $U (0,1)$ with the logistic distribution $L (0, c)$ having mean 0 and scale parameter $c$ , see Chapter 23 in Johnson et al. (1995). Therefore, the ‘soft uniform distribution’ might also be referred to as a ‘mollified uniform distribution’ in the sense of Friedrichs (1944), using a logistic mollifier. The decomposition $X = U + L$ with independent summands $U \sim U (0,1)$ and $L \sim L (0, c)$ can be utilized for moment calculations. For example, the mean and variance of the soft/mollified uniform distribution are given by $E [X] = \frac{1}{2}$ and $V [X] = \frac{1}{12} + c^{2} π^{2} / 3$ , respectively. Figure A1(c) plots the variance $V [X]$ against $c$ (which converges to $1 / 12$ for $c \to 0$ ) as well as the probability of falling outside the unit interval, that is, $P (X \in [0,1]) = 2 \cdot s c_{c} (0) \approx 2 \ln (2) \cdot c$ .

Appendix B Derivations

B.1 Proof of Lemma 2

For the discrete binomial distribution, the TV distance is computed as (Gibbs & Su, 2002, p. 424)

T V (B i n (n, p_{1}), B i n (n, p_{2})) = 0.5 \sum_{k = 0}^{n} (\begin{matrix} n \\ k \end{matrix}) | p_{1}^{k} {(1 - p_{1})}^{n - k} - p_{2}^{k} {(1 - p_{2})}^{n - k} | .

The proof of the first inequality in Lemma 2,

T V (B i n (n, p_{1}), B i n (n, p_{2})) \leq 1 - {(1 - | p_{1} - p_{2} |)}^{n},

(B.1)

is done by induction. Recall that $p_{1} \geq p_{2}$ such that $| p_{1} - p_{2} | = p_{1} - p_{2}$ . For the initial case $n = 1$ , we get

T V (B i n (1, p_{1}), B i n (1, p_{2})) = | p_{1} - p_{2} | = 1 - (1 - | p_{1} - p_{2} |)

such that (B.1) holds. So let us turn to the inductive step. Given that the proposed inequality (B.1) holds for some $n \in ℕ$ , we have for $n + 1$ that

\begin{array}{l} \begin{array}{l} 2 TV (Bin (n + 1, p_{1}), Bin (n + 1, p_{2})) = \sum_{k = 0}^{n + 1} (\begin{matrix} n + 1 \\ k \end{matrix}) |p_{1}^{k} {(1 - p_{1})}^{n + 1 - k} - p_{2}^{k} {(1 - p_{2})}^{n + 1 - k}| \end{array} \\ \begin{array}{l} = \sum_{k = 0}^{n + 1} ((\begin{matrix} n \\ k \end{matrix}) + (\begin{matrix} n \\ k - 1 \end{matrix})) |p_{1}^{k} {(1 - p_{1})}^{n + 1 - k} - p_{2}^{k} {(1 - p_{2})}^{n + 1 - k}| \end{array} \\ \begin{array}{l} = \sum_{k = 0}^{n} (\begin{matrix} n \\ k \end{matrix}) |p_{1}^{k} {(1 - p_{1})}^{n + 1 - k} - p_{2}^{k} {(1 - p_{2})}^{n - k} (1 - p_{1}) + p_{2}^{k} {(1 - p_{2})}^{n - k} (1 - p_{1}) - p_{2}^{k} {(1 - p_{2})}^{n + 1 - k}| \end{array} \\ \begin{array}{l} = \sum_{k = 0}^{n} (\begin{matrix} n \\ k \end{matrix}) |p_{1}^{k} {(1 - p_{1})}^{n + 1 - k} - p_{2}^{k} {(1 - p_{2})}^{n + 1 - k}| + \sum_{j = 0}^{n} (\begin{matrix} n \\ j \end{matrix}) |p_{1}^{j + 1} {(1 - p_{1})}^{n - j} - p_{2}^{j + 1} {(1 - p_{2})}^{n - j}| \end{array} \\ \begin{array}{l} \leq \sum_{k = 0}^{n} (\begin{matrix} n \\ k \end{matrix}) (|p_{1}^{k} {(1 - p_{1})}^{n + 1 - k} - p_{2}^{k} {(1 - p_{2})}^{n - k} (1 - p_{1}) |+| p_{2}^{k} {(1 - p_{2})}^{n - k} (1 - p_{1}) - p_{2}^{k} {(1 - p_{2})}^{n + 1 - k}|) \end{array} \\ \begin{array}{l} + \sum_{k = 0}^{n} (\begin{matrix} n \\ k \end{matrix}) (|p_{1}^{k + 1} {(1 - p_{1})}^{n - k} - p_{1}^{k} {(1 - p_{1})}^{n - k} p_{2} |+| p_{1}^{k} {(1 - p_{1})}^{n - k} p_{2} - p_{2}^{k + 1} {(1 - p_{2})}^{n - k}|) \end{array} \\ \begin{array}{l} = \sum_{k = 0}^{n} (\begin{matrix} n \\ k \end{matrix}) ((1 - p_{1}) |p_{1}^{k} {(1 - p_{1})}^{n - k} - p_{2}^{k} {(1 - p_{2})}^{n - k}| + p_{2}^{k} {(1 - p_{2})}^{n - k} |p_{2} - p_{1}|) \end{array} \\ \begin{array}{l} + \sum_{k = 0}^{n} (\begin{matrix} n \\ k \end{matrix}) (p_{1}^{k} {(1 - p_{1})}^{n - k} |p_{1} - p_{2}| + p_{2} |p_{1}^{k} {(1 - p_{1})}^{n - k} - p_{2}^{k} {(1 - p_{2})}^{n - k}|) . \end{array} \end{array}

So,

\begin{array}{l} T V (B i n (n + 1, p_{1}), B i n (n + 1, p_{2})) \\ = (1 - p_{1}) T V (B i n (n, p_{1}), B i n (n, p_{2})) + 0.5 |p_{1} - p_{2} |+ 0.5| p_{1} - p_{2}| + p_{2} T V (B i n (n, p_{1}), B i n (n, p_{2})) \\ \overset{(2.1)}{\leq} (1 - |p_{1} - p_{2}|) (1 - {(1 - |p_{1} - p_{2}|)}^{n}) + |p_{1} - p_{2}| = 1 - {(1 - |p_{1} - p_{2}|)}^{n + 1}, \end{array}

which completes the proof of (B.1). Then, using that

| p_{1} - p_{2} | \leq \frac{p_{1} - p_{2}}{p_{1}} = 1 - \frac{p_{2}}{p_{1}},

(B.1) implies that

1 - {(1 - | p_{1} - p_{2} |)}^{n} \leq 1 - {(1 - (1 - \frac{p_{2}}{p_{1}}))}^{n} = 1 - {(\frac{p_{2}}{p_{1}})}^{n}

holds as well. Finally, as $1 + u \leq e^{u}$ for all $u \geq 0$ (for example, Doukhan & Neumann, 2019, p. 96), we get

\frac{p_{1}}{p_{2}} = 1 + \frac{| p_{1} - p_{2} |}{p_{2}} \leq \exp (\frac{| p_{1} - p_{2} |}{p_{2}}) \leq \exp (\frac{| p_{1} - p_{2} |}{n / δ}),

as $p_{1} \geq p_{2} \geq n / δ > 0$ is assumed. This implies that

1 - {(\frac{p_{2}}{p_{1}})}^{n} \leq 1 - \exp (- n \frac{| p_{1} - p_{2} |}{n / δ}) = 1 - \exp (- δ | p_{1} - p_{2} |),

which completes the proof of Lemma 2.

B.2 Proof of Theorem 2

The proof of Theorem 2.2 is done in analogy to Weiß et al. (2022) by showing that conditions (A1)–(A3) in Doukhan & Neumann (2019) are satisfied.

The argumentation for (A1) and (A2) is nearly identical to the one in Supplement S.4 of Weiß et al. (2022). With ${\bar{a}}_{0} = f (0) + n α_{0} = c \ln (2 / (1 + \exp (- \frac{n}{c}))) + n α_{0}$ according to (2.4), and with ${\bar{a}}_{i} = \max {0, α_{i}}$ and ${\bar{b}}_{j} = \max {0, β_{j}}$ , it holds that

M_{t} \leq {\bar{a}}_{0} + \sum_{i = 1}^{p} {\bar{a}}_{i} X_{t - i} + \sum_{j = 1}^{q} {\bar{b}}_{j} M_{t - j} .

Therefore, the drift condition (A1’) and, thus, the geometric drift condition (A1) in Doukhan & Neumann (2019) are satisfied.

To prove the semi-contractive condition (A2), we utilize the mean value theorem and conclude that $f (x_{2}) - f (x_{1}) = f^{'} (ξ) (x_{2} - x_{1})$ for some $ξ$ with $x_{2} > ξ > x_{1}$ . Since $s c_{c^{'}} (x) < 1$ according to (A2), also $f^{'} (x) < 1$ such that $f$ is Lipschitz-continuous with constant 1. In particular,

\begin{array}{l} |f (n α_{0} + \sum_{i = 1}^{p} α_{i} x_{i} + \sum_{j = 1}^{q} β_{j} z_{j}) - f (n α_{0} + \sum_{i = 1}^{p} α_{i} x_{i} + \sum_{j = 1}^{q} β_{j} {z^{'}}_{j})| \\ \leq |\sum_{j = 1}^{q} β_{j} z_{j} - \sum_{j = 1}^{q} β_{j} {z^{'}}_{j} |\leq \sum_{j = 1}^{q}| β_{j} || z_{j} - {z^{'}}_{j}| \end{array}

for all $x_{i}, z_{j}, {z^{'}}_{j}$ , so (A2) follows.

A major difference to Weiß et al. (2022) is the argumentation for the similarity condition (A3). Here, Lemmata 21 and 2 allow to conclude that there exists a $δ > 0$ such that

T V (B i n (n, p_{1}), B i n (n, p_{2})) \leq 1 - \exp (- δ | p_{1} - p_{2} |)

holds for the conditional distribution of the scBINGARCH $(p, q)$ process (2.6). So the proof of Theorem 2.2 is complete.

B.3 Proof of (3.5)

To prove that Condition 5.1 of Billingsley (1961) holds, first note that the set of pairs $(j, i)$ , where the transition probability

p_{j | i} = (\begin{matrix} n \\ j \end{matrix}) s c_{c} {(α_{0} + α_{1} \frac{i}{n})}^{j} {(1 - s c_{c} (α_{0} + α_{1} \frac{i}{n}))}^{n - j}

is truly positive, is independent of the parameter vector $θ = (α_{0}, α_{1})$ , because $P_{t} \in (0,1)$ always holds, and the binomial distribution always has the full support ${0, \dots, n}$ . Furthermore, the $p_{j | i}$ have continuous (third-order) partial derivatives w. r. t. $θ$ , because they are a composition of continuously differentiable functions.

Let us have a closer look at the first-order derivatives. For the sake of readability, we omit the subscript ‘ $c$ ’ in the sequel. Then, we compute

\frac{\partial p_{j | i}}{\partial θ_{k}} = \frac{\partial p_{j | i}}{\partial s c} \frac{\partial s c}{\partial θ_{k}} for k = 1,2,

where the partial derivative of $p_{j | i}$ with respect to the soft-clipping function is given by

\frac{\partial p_{j | i}}{\partial s c} = (\begin{matrix} n \\ j \end{matrix}) (j s c^{j - 1} {(1 - s c)}^{n - j} - (n - j) s c^{j} {(1 - s c)}^{n - j - 1}) .

The partial derivatives of the soft-clipping function with respect to $θ$ , in turn, are

\frac{\partial s c}{\partial α_{0}} = s c' a n d \frac{\partial s c}{\partial α_{1}} = \frac{i}{n} s c' .

We define the following ( ${(n + 1)}^{2} \times 2)$ Jacobian matrix:

J = (\begin{matrix} \frac{\partial p_{0 | 0}}{\partial α_{0}} & \frac{\partial p_{0 | 0}}{\partial α_{1}} \\ ⋮ & ⋮ \\ \frac{\partial p_{n | n}}{\partial α_{0}} & \frac{\partial p_{n | n}}{\partial α_{1}} \end{matrix}) .

If this matrix has full rank throughout the parameter space, then Condition 5.1 holds. We consider the following (2 $\times$ 2) quadratic sub-matrix:

J = (\begin{matrix} \frac{\partial p_{0 | 0}}{\partial α_{0}} & \frac{\partial p_{0 | 0}}{\partial α_{1}} \\ \frac{\partial p_{n | n}}{\partial α_{0}} & \frac{\partial p_{n | n}}{\partial α_{1}} \end{matrix}) with \begin{array}{l} p_{0 | 0} = (1 - s c (α_{0} {))}^{n}, \\ p_{n | n} = s c {(α_{0} + α_{1})}^{n} . \end{array}

The determinant of $J$ is given by

\det (J) = \frac{\partial p_{0 | 0}}{\partial α_{0}} \frac{\partial p_{n | n}}{\partial α_{1}} = - n^{2} sc' (α_{0}) sc' (α_{0} + α_{1}) {(1 - sc (α_{0}))}^{n - 1} sc {(α_{0} + α_{1})}^{n - 1} .

Because $0 < sc, sc' < 1$ , we have $\det (J) < 0$ . Therefore, $J$ and $J$ have full rank, which completes the proof of (2.11).

Additional Estimation of $c$ (Remark 1)

Generally, the previous argumentation could be extended to also cover the adjustment parameter $c > 0$ . Since a bivariate Markov chain is fully specified by two model parameters, the upper bound $n$ needs to satisfy $n \geq 2$ if also $c$ needs to be estimated. In addition to the previous derivatives, one now also needs the partial derivative of $s c_{c}$ w. r. t. $c$ , that is,

\frac{\partial s c_{c}}{\partial c} = \ln (\frac{1 + \exp (\frac{x}{c})}{1 + \exp (\frac{x - 1}{c})}) - \frac{x}{c} \frac{\exp (\frac{x}{c})}{1 + \exp (\frac{x}{c})} + \frac{x - 1}{c} \frac{\exp (\frac{x - 1}{c})}{1 + \exp (\frac{x - 1}{c})} .

While $0 < s c' (x) < 1$ with a maximum in $x = 0.5$ , $\frac{\partial s c}{\partial c} (x)$ has a point of symmetry in $(0.5,0)$ : $\frac{\partial s c}{\partial c} (0.5 + x) = - \frac{\partial s c}{\partial c} (0.5 - x)$ . Next, one defines the ( ${(n + 1)}^{2} \times 3)$ Jacobian matrix

J = (\begin{matrix} \frac{\partial p_{0 | 0}}{\partial α_{0}} & \frac{\partial p_{0 | 0}}{\partial α_{1}} & \frac{\partial p_{0 | 0}}{\partial c} \\ ⋮ & ⋮ & ⋮ \\ \frac{\partial p_{n | n}}{\partial α_{0}} & \frac{\partial p_{n | n}}{\partial α_{1}} & \frac{\partial p_{n | n}}{\partial c} \end{matrix}),

which needs to be shown to have full rank throughout the parameter space. Again, one may look at appropriate (3 $\times$ 3) quadratic sub-matrices for proving the full rank. For example, the sub-matrix

J = (\begin{matrix} \frac{\partial p_{0 | 0}}{\partial α_{0}} & \frac{\partial p_{0 | 0}}{\partial α_{1}} & \frac{\partial p_{0 | 0}}{\partial c} \\ \frac{\partial p_{n | 1}}{\partial α_{0}} & \frac{\partial p_{n | 1}}{\partial α_{1}} & \frac{\partial p_{n | 1}}{\partial c} \\ \frac{\partial p_{n | n}}{\partial α_{0}} & \frac{\partial p_{n | n}}{\partial α_{1}} & \frac{\partial p_{n | n}}{\partial c} \end{matrix}) with \begin{array}{l} p_{0 | 0} = (1 - s c_{c} (α_{0} {))}^{n}, \\ p_{n | 1} = s c_{c} {(α_{0} + α_{1} / n)}^{n}, \\ p_{n | n} = s c_{c} {(α_{0} + α_{1})}^{n}, \end{array}

has the determinant $\det (J) = D_{1} (D_{2} - D_{3})$ with

\begin{array}{l} D_{1} = n^{2} {(1 - sc(α_{0}))}^{n - 1} sc {(α_{0} + α_{1})}^{n - 1} sc {(α_{0} + α_{1} / n)}^{n - 1}, \\ \begin{array}{l} D_{2} = n sc' (α_{0}) sc' (α_{0} + α_{1}) \frac{\partial s c}{\partial c} (α_{0} + α_{1} / n), \end{array} \\ \begin{array}{l} D_{3} = sc' (α_{0} + α_{1} / n) ((n - 1) sc' (α_{0} + α_{1}) \frac{\partial s c}{\partial c} (α_{0}) + sc' (α_{0}) \frac{\partial s c}{\partial c} (α_{0} + α_{1})) . \end{array} \end{array}

Since the first factor $D_{1}$ is always positive, it remains to show that $D_{2} - D_{3}$ is non-zero for the considered paremeter scenario.

B.4 Proof of Theorem 1

We have to verify the conditions (A0)–(A7) in Davis & Liu (2016). For this purpose, we adapt the notations of Davis & Liu (2016). The distribution $B i n (n, p)$ belongs to the one-parameter exponential family with

\begin{array}{l} η = \ln (\frac{p}{1 - p}), \frac{\partial}{\partial p} η = \frac{1}{p} + \frac{1}{1 - p}, \\ A (η) = n \ln (1 + e^{η}) = - n \ln (1 - p), B (η) = A^{'} (η) = n {(1 + e^{- η})}^{- 1}, \\ B^{'} (η) = {A^{'}}^{'} (η) = n e^{- η} {(1 + e^{- η})}^{- 2}, B^{- 1} (z) = \ln (\frac{z}{n - z}) . \end{array}

(B.2)

Applied to model (2.14) with $θ = (α_{0}, α_{1}, β_{1})^{⊤}$ , we have the relations

M_{t} = B (η_{t}) \Leftrightarrow η_{t} = B^{- 1} (M_{t}) = \ln (\frac{P_{t}}{1 - P_{t}}), B^{'} (η_{t}) = n P_{t} (1 - P_{t}) .

(B.3)

Furthermore, the partial derivatives $\frac{\partial}{\partial θ_{i}} η_{t}$ are determined by differentiation of the inverse function $B^{- 1}$ :

\frac{\partial}{\partial θ_{i}} η_{t} = \frac{1}{B^{'} (η_{t})} \cdot \frac{\partial}{\partial θ_{i}} B (η_{t}) = \frac{1}{P_{t} (1 - P_{t})} \cdot \frac{\partial}{\partial θ_{i}} P_{t} (θ) .

(B.4)

Let us now check the conditions (A0)–(A7) in Davis & Liu (2016).

(A0)

is satisfied because of the parameter restrictions for the scBINGARCH $(1,1)$ process.

(A1)

Because of Lemma 2.2, there exist $ϵ_{1}, ϵ_{2} > 0$ such that $P_{t} \in [ϵ_{1},1 - ϵ_{2}]$ for all $t$ . Thus, the range $R (B)$ of $B (η)$ has a truly positive lower bound.

(A2)

holds because $s c_{c} (x)$ is continuous and, thus, $P_{t} (θ)$ is a continuous function of $θ$ . Actually, $s c_{c} (x)$ is continuously differentiable up to any order, so (A6) also holds.

(A3)

does not apply here.

(A4)

Because of $P_{t} \in [ϵ_{1},1 - ϵ_{2}]$ , also $B^{- 1} (M_{t}) = \ln (\frac{P_{t}}{1 - P_{t}})$ is a bounded r. v.. So to show (A4), the mean of a bounded r. v. needs to be computed, which necessarily takes a finite value.

(A5)

We adapt the argumentation in Appendix C.6 of Davis & Liu (2016). Assume that there is a $t$ such that $P_{t} (θ) = P_{t} (θ^{*})$ almost surely, that is,

s c_{c} (α_{0} + α_{1} X_{t - 1} / n + β_{1} P_{t - 1} (θ)) = s c_{c} (α_{0}^{*} + α_{1}^{*} X_{t - 1} / n + β_{1}^{*} P_{t - 1} (θ^{*})) .

Since $s c_{c} (x)$ is injective, $s c_{c} (x) = s c_{c} (y)$ implies $x = y$ , so

(α_{1} - α_{1}^{*}) X_{t - 1} / n = α_{0}^{*} - α_{0} + β_{1}^{*} P_{t - 1} (θ^{*}) - β_{1} P_{t - 1} (θ) .

In particular, $P_{t - 1} (θ) = g (X_{t - 2}, X_{t - 3}, \dots)$ and $P_{t - 1} (θ^{*}) = g^{*} (X_{t - 2}, X_{t - 3}, \dots)$ for some functions $g, g^{*}$ , that is, $P_{t - 1} (θ), P_{t - 1} (θ^{*}) \in F_{t - 2}$ . So with an analogous argumentation as in Appendix C.6 of Davis & Liu (2016), we conclude $θ = θ^{*}$ .

(A7)

$B^{'} (η_{t})$ is bounded by (B.3). Together with Lemma 2.2, it holds that $B^{'} (η_{t}) \in [n ϵ_{1} ϵ_{2}, \frac{n}{4}]$ . Using (B.4), we compute

E [B^{'} (η_{t} (θ_{0})) (\frac{\partial η_{t} (θ_{0})}{\partial θ_{i}})^{2}] = E [\frac{1}{B^{'} (η_{t})} {(\frac{\partial B (η_{t})}{\partial θ_{i}})}^{2}] \leq C E [(\frac{\partial}{\partial θ_{i}} P_{t} (θ {))}^{2}] .

Here, with $S_{t - 1} = s c'_{c} (α_{0} + α_{1} X_{t - 1} / n + β_{1} P_{t - 1} (θ)) \in (0, κ]$ with $κ = s c'_{c} (0.5) < 1$ , we have

\begin{array}{l} \frac{\partial}{\partial β_{1}} P_{t} (θ) = S_{t - 1} \cdot (P_{t - 1} (θ) + β_{1} \cdot \frac{\partial}{\partial β_{1}} P_{t - 1} (θ)), \\ \frac{\partial}{\partial α_{1}} P_{t} (θ) = S_{t - 1} \cdot (X_{t - 1} / n + β_{1} \cdot \frac{\partial}{\partial α_{1}} P_{t - 1} (θ)), \\ \frac{\partial}{\partial α_{0}} P_{t} (θ) = S_{t - 1} \cdot (1 + β_{1} \cdot \frac{\partial}{\partial α_{0}} P_{t - 1} (θ)) . \end{array}

So each partial derivative satisfies a linear first-order difference equation of the form

\frac{\partial}{\partial θ_{i}} P_{t} (θ) = a_{t, i} + b_{t, i} \cdot \frac{\partial}{\partial θ_{i}} P_{t - 1} (θ) withcoefficients a_{t, i} \in [0, κ] and b_{t, i} \in (- κ, κ) .

Thus, $\frac{\partial}{\partial θ_{i}} P_{t} (θ)$ has a bounded range, and (A7) holds.

Supplementary materials

Supplementary materials for this article are available online.

Supplemental Material for Soft-clipping INGARCH models for time series of bounded counts by Christian H. Weiß, Malte Jahn, in Statistical Modelling

Footnotes

Acknowledgements

The authors thank the editor, the associate editor and the two referees for their useful comments on an earlier draft of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors received no financial support for the research, authorship and/or publication of this article.

References

Billingsley

(1961) Statistical Inference for Markov Processes . Chicago: University of Chicago Press.

Bradley

(2005) Basic properties of strong mixing conditions: a survey and some open questions. Probability Surveys 2, 107–144.

Cai

, He

, Sun

and Vasconcelos

(2017) Deep learning with low precision by halfwave Gaussian quantization. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , Honolulu, HI, pp. 5406–5414.

Chen

, Li

and Zhu

(2020) Two classes of dynamic binomial integer-valued ARCH models. Brazilian Journal of Probability and Statistics , 34, 685–711.

Chen

, Li

and Zhu

(2022) A new class of integer-valued GARCH models for time series of bounded counts with extra-binomial variation. AStA Advances in Statistical Analysis , 106, 243–270.

Cui

and Zheng

(2017) Conditional maximum likelihood estimation for a class of observation-driven time series models for count data. Statistics & Probability Letters , 123, 193–201.

Davis

and Liu

(2016) Theory and inference for a class of nonlinear models with application to time series of counts. Statistica Sinica , 26, 1673–1707.

Doukhan

and Neumann

(2019) Absolute regularity of semi-contractive GARCH-type processes. Journal of Applied Probability , 56, 91–115.

Ferland

, Latour

and Oraichi

(2006) Integer-valued GARCH processes. Journal of Time Series Analysis , 27, 923–942.

10.

Fokianos

(2011) Some recent progress in count time series. Statistics , 45, 49–58.

11.

Friedrichs

(1944) The identity of weak and strong extensions of differential operators. Transactions of the American Mathematical Society , 55, 132–151.

12.

Gibbs

and Su

(2002) On choosing and bounding probability metrics. International Statistical Review , 70, 419–435.

13.

Jacobs

and Lewis

PAW

(1983) Stationary discrete autoregressive-moving average time series generated by mixtures. Journal of Time Series Analysis , 4, 19–36.

14.

Jentsch

and Reichmann

(2019) Generalized binary time series models. Econometrics , 7, 47.

15.

Johnson

, Kotz

and Balakrishnan

(1995) Continuous Univariate Distributions , Volume 2, 2nd edition. Hoboken, NJ: John Wiley & Sons.

16.

Klimek

and Perelstein

(2020) Neural network-based approach to phase space integration. SciPost Physics , 9, 053.

17.

Lin

, Dou

, Kuriki

, and Huang

J-S

(2014) Recent developments on the construction of bivariate distributions with fixed marginals. Journal of Statistical Distributions and Applications , 1, 14.

18.

Liu

, Zhu

and Zhu

(2022a) Modeling normalcy-dominant ordinal time series: An application to air quality level. Journal of Time Series Analysis , 43, 460–478.

19.

Liu

, Li

and Zhu

(2022b) Modeling air quality level with a flexible categorical autoregression. Stochastic Environmental Research and Risk Assessment , 1–11. https://doi.org/10.1007/s00477-021-02164-0

20.

McKenzie

(1985) Some simple models for discrete variate time series. Water Resources Bulletin , 21, 645–650.

21.

Mei

and Eisner

(2017) The neural Hawkes process: a neurally self-modulating multivariate point process. In: Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17) , edited by von Luxburg ., pages 6757–6767. Red Hook, NY: Curran Associates Inc.

22.

Ristic´

, Weiß

and Janjic´

(2016) A binomial integer-valued ARCH model. International Journal of Biostatistics , 12, 20150051.

23.

Steutel

and van Harn

(1979) Discrete analogues of self-decomposability and stability. Annals of Probability , 7, 893–899.

24.

Weiß

(2018) An Introduction to Discrete-valued Time Series . Chichester: John Wiley & Sons.

25.

Weiß

(2020) Distance-based analysis of ordinal data and ordinal time series. Journal of the American Statistical Association , 115, 1189–1200.

26.

Weiß

and Pollett

(2014) Binomial autoregressive processes with density dependent thinning. Journal of Time Series Analysis , 35, 115–132.

27.

Weiß

, Zhu

and Hoshiyar

(2022) Softplus INGARCH models. Statistica Sinica , 32, 1099–1120.

28.

Wiemann

PFV

, Kneib

and Hambuckers

(2021) Using the softplus function to construct alternative link functions in generalized linear models and beyond . arXiv: arXiv:2111.14207v1.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.15 MB

0.00 MB

Soft-clipping INGARCH models for time series of bounded counts

Abstract

Keywords

1 Introduction

2 The Soft-clipping approach for bounded counts

2.1 Motivation

Plots of different (a) softplus functions s c ( x ) , (b) reversed softplus functions n − s c ( n − x ) with n = 3 and (c) rescaled soft clipping functions n s c c n ( x n ) with n = 3

Plots of attainable pairs of A C F ( 1 ) against 1 n E [ X t ] for different upper bounds n , where black region corresponds to scBINARCH ( 1 ) model with c = 0.01 , and grey region to BinAR ( 1 ) model (or equivalently, BINARCH ( 1 ) model)

Plots of attainable pairs of A C F ( 2 ) against A C F ( 1 ) for the scBINARCH ( 2 ) model, for different upper bounds n and with c = 0.01

Plots of a sample path and ACF ( h ) of a scBINGARCH ( 1,1 ) process with n = 10 , c = 0.01 , α 0 = 0.3 ( 1 − α 1 − β 1 ) , and (a–b) α 1 = 0.25 , β 1 = 0.70 ; (c–d) α 1 = − 0.25 , β 1 = − 0.70

6.1 Geyser eruption data

Figure 5

Time series plot and sample PACF of geyser eruption data

Figure 6

Time series plot and sample ACF of air quality data for (a–b) Beijing and (c–d) Zhengzhou

Model diagnostics for scBINGARCH ( 1,1 ) fits: sample statistics of Pearson residuals in (a), and PIT histogram for (b) Beijing and (c) Zhengzhou

Appendices

Appendix A Soft clipping and mollified uniform distribution

Soft/mollified uniform distribution with scale parameter c . Plots of (a) CDF and (b) PDF for different c against x . (c) Plots of variance and ‘out of unit interval’-probability against c

B.1 Proof of Lemma 2

B.3 Proof of (3.5)

Additional Estimation of c (Remark 1)

B.4 Proof of Theorem 1

Supplementary materials

Supplementary materials for this article are available online.

Footnotes

Acknowledgements

Declaration of Conflicting Interests

Funding

References

Supplementary Material

Plots of different (a) softplus functions $s_{c} (x)$ , (b) reversed softplus functions $n - s_{c} (n - x)$ with $n = 3$ and (c) rescaled soft clipping functions $n s c_{\frac{c}{n}} (\frac{x}{n})$ with $n = 3$

Plots of attainable pairs of $A C F (1)$ against $\frac{1}{n} E [X_{t}]$ for different upper bounds $n$ , where black region corresponds to scBINARCH $(1)$ model with $c = 0.01$ , and grey region to BinAR $(1)$ model (or equivalently, BINARCH $(1)$ model)

Plots of attainable pairs of $A C F (2)$ against $A C F (1)$ for the scBINARCH $(2)$ model, for different upper bounds $n$ and with $c = 0.01$

Plots of a sample path and ACF $(h)$ of a scBINGARCH $(1,1)$ process with $n = 10$ , $c = 0.01$ , $α_{0} = 0.3 (1 - α_{1} - β_{1})$ , and (a–b) $α_{1} = 0.25$ , $β_{1} = 0.70$ ; (c–d) $α_{1} = - 0.25$ , $β_{1} = - 0.70$

Model diagnostics for scBINGARCH $(1,1)$ fits: sample statistics of Pearson residuals in (a), and PIT histogram for (b) Beijing and (c) Zhengzhou

Soft/mollified uniform distribution with scale parameter $c$ . Plots of (a) CDF and (b) PDF for different $c$ against $x$ . (c) Plots of variance and ‘out of unit interval’-probability against $c$

Additional Estimation of $c$ (Remark 1)