Sage Journals: Discover world-class research

Abstract

This article studies long-term, short-term volatility and co-volatility in stock markets by introducing modelling strategies to the multivariate data analysis that deal with serially correlated innovations and cross-section dependence. In particular, it presents an innovative mixed-effects model through a GARCH process, allowing for heterogeneity effects and time-series dynamics. We propose a non-parametric regression model of the penalized low-rank smoothing spline to present time trends into the variance and covariance equations. The strategy provides flexible modelling of the low-frequency volatility and co-volatility in equity markets. The decomposed low-frequency matrix was modelled using the modified Cholesky factorization. The Hamiltonian Monte Carlo technique is implemented as a Bayesian computing process for estimating parameters and latent factors. The advantage of our modelling strategy in empirical studies is highlighted by examining the effect of latent financial factors on a panel across 10 equities over 110 weekly series. The model can differentiate non-parametrically dynamic patterns of high and low frequencies of variance–covariance structural equations and incorporate economic features to predict variabilities in stock markets regarding time-series evidence.

Keywords

Hamiltonian Monte Carlo Modified Cholesky decomposition multivariate GARCH non-parametric regression optimization

1 Introduction

Most effort in the literature of financial and economic multivariate data with serially correlated time-series has been devoted to the slow-moving variation and covariation topics. Related studies have frequently been centred on the long-run variation. For example, Engle and Rangel (2008) introduced a notable spline-GARCH model for low-frequency volatility and global macroeconomic events. Audrino and Bühlmann (2009) offered a spline regression model in the non-parametric setting to predict the volatility of financial time-series. Some useful strategies rendering high-performance model fittings are introduced to mixed modelling with smoothing splines (Wahba (1990); Eilers and Marx (1996); Wand (2003)). Speckman and Sun (2003) applied full Bayesian smoothing splines, thin-plate splines, L-splines and several models using intrinsic autoregressive priors. Vogt and Linton (2017) investigated the longitudinal data analysis with non-parametric regression functions in the case of varying individual-specific effects.

The treatment of volatilities is conventionally based on variances, while the modelling of covariance functions can provide better representation. It is reasonable since the volatility in a market can cause volatility in other markets. Thus, most research recommends fitting the multivariate generalized autoregressive conditional heteroscedastic (MGARCH) model. It can describe the volatility in financial markets and efficiently estimate large time-varying covariance matrices. Rangel and Engle (2012) characterize the correlation matrix's high- and low-frequency components by combining a factor model with other specifications that capture the dynamic behaviour of volatilities and covariances between a common factor and idiosyncratic returns. Engle and Sokalska (2012) introduced an intra day volatility forecasting model to some assets by decomposing the volatility of high-frequency returns into several components.

Extensions of multivariate GARCH models to cross-sectional time-series data appeal to further investigations since they can conveniently consider volatilities and co-volatilities in correlated cross-section financial data. So far, the existing models’ methodology has been concentrating on volatility using the quadratic-spline method. There is a gap in the literature for flexible low-frequency models to deliver a more precise prediction. These issues motivate us to introduce the spline-GARCH mixed-effects model by extending the univariate case of Engle and Rangel (2008). We study low-frequency changes using semi-parametric mixed-effects models having a time-varying conditional covariance matrix. We let parameters across assets heterogeneous to control possibly correlated, time-invariant and heterogeneity effects.

Frequently, the volatility of an asset affects other assets through conditional covariances over time. We expect that a shock in a market increases the volatility of other associated markets. However, the influence of negative and positive shocks does not lead to the same fluctuation. Our study aims to present the cross-asset dependence between asset returns, which changes over time intervals, and the amount of volatility during a specific period. Our findings reveal that cross-asset dependence may not increase in the long term. It may occur because of the globalization of financial markets. The dependence effect can be investigated directly using cross-section and time-series modelling specified by the dynamic behaviour of covariances or correlations.

In addition to preserving appropriate features of the GARCH model, its time-varying extension can sufficiently isolate high- and low-frequency volatility and co-volatility. This extension leads to coherent predictions since short-term volatility immediately affects the log-return, while long-term volatility possesses a significant role in future transactions. It also helps financial traders with proper forecasting decisions.

We organize the article as follows. In Section 2, we introduce the mixed-effects spline MGARCH model. Section 3 implements an advanced numerical estimation method, so-called Hamiltonian or Hybrid Monte Carlo (HMC), to provide inference. In Section 4, we conducted simulation studies to investigate the main properties of our model and presented an empirical study in predicting low- and high-frequency volatility of log-returns data. Section 5 includes concluding remarks.

2 Specification of the proposed model

We first specify the heterogeneous MGARCH model and then extend the mixed-effects methodology to time-varying MGARCH models aiming to describe low-frequency volatilities and co-volatilities.

2.1 Heterogeneous MGARCH model

For a cross-section of $N$ assets and $T$ periods, let the $r_{it}$ , for $i = 1, 2, \dots, N$ and $t = 1, 2, \dots, T$ be the return at time $t$ on the $i$ th asset. Assume that the unexpected returns follow the multivariate GARCH model

ϵ_{t} = r_{t} - E (r_{t} | I_{t - 1}) = \sum_{t}^{1 / 2} e_{t},

(2.1)

where the innovation term $e_{t}$ follows the multivariate normal distribution with zero-mean vector and identity covariance matrix. Then, the unexpected returns series $ε_{t}$ are normally distributed with zero means and heterogeneous variances and covariances $Σ_{t} = {σ_{ij, t}}$ for all $i, j = 1, \dots, N$ . Assume that there is no autocorrelation and no contemporaneous cross-sectional correlations in innovations $e_{t}$ . Consider the following equations for variances and covariances

σ_{i i, t} = ω_{i i} + γ ϵ_{i, t - 1}^{2} + δ σ_{i i, t - 1},

(2.2)

σ_{i j, t} = ω_{i j} + λ ϵ_{i, t - 1} ϵ_{j, t - 1} + θ σ_{i j, t - 1},

(2.3)

for $t > 1$ and $i \neq j$ , where $ω_{ii} > 0$ and $ω_{ij} \in ℝ$ . A convenient covariance matrix for the start-up process is $Σ_{1} = \{σ^{2} ω_{ij}\}$ , where $σ^{2}$ is a constant positive parameter. This setting accounts for the correlation between the elements of $Σ_{1}$ and subsequent components (2.2) and (2.3). Also, it provides valid inference since the estimation process uses information of all periods. The sufficient conditions for the conditional variance and covariance processes to converge a positive definite (PD) matrix are that $γ + δ < 1$ and $λ + θ < 1$ . Also, under conditions $γ ⩾ λ$ , $δ ⩾ θ$ , and positive definiteness of random-effects matrix $\{ω_{ij}\}$ , the matrix $Σ_{t}$ will be PD for all $t$ .

2.2 Modelling techniques to isolate volatility and co-volatility

We now extend mixed-effects modelling of the heterogeneous MGARCH using the spline-GARCH technique while describing low-frequency volatilities and co-volatilities in the equities market. Commonly, stock returns possess time-varying variances and covariances due to daily news intensity that (a) depend on the macroeconomic events or time-varying conditions that some observed covariates can explain, (b) respond to several latent factors that cannot often be recognized, at least without having additional information. Flexible modelling of the low-frequency volatility and co-volatility, say $τ_{ij, t}$ , as variance and covariance components, can deliver the information of all observed and latent factors. It motivated us to develop the non-parametric model fittings as follows. First, we reparameterize (2.2) and (2.3) to the decomposition form

σ_{i i, t} = h_{i i, t} τ_{i i, t},

(2.4)

where the $τ_{ii, t}$ are the low-frequency volatility components. The parameter $h_{ii, t}$ , for each $i$ , is a non-negative time-series, for example, GARCH(1,1) with unit unconditional mean, expressed as

\begin{matrix} h_{ii, t} = ω_{ii} + γ \frac{ε_{i, t - 1}^{2}}{τ_{ii, t - 1}} + δ h_{ii, t - 1} . \end{matrix}

(2.5)

Similarly,

\begin{matrix} σ_{ij, t} = h_{ij, t} τ_{ij, t}, \end{matrix}

(2.6)

for $i \neq j$ , where the $τ_{ijt}$ are the low-frequency co-volatility components and

\begin{matrix} h_{ij, t} = ω_{ij} + λ \frac{ε_{i, t - 1} ε_{j, t - 1}}{\sqrt{τ_{ii, t - 1} τ_{jj, t - 1}}} + θ h_{ij, t - 1} . \end{matrix}

(2.7)

Equation (2.7) provides a well-defined description of co-volatility. We let (2.5) and (2.7) be heterogeneous by assuming random elements $ω_{ii}$ and $ω_{ij}$ as discussed later. This strategy extends the previous work on volatility modelling from the univariate to multivariate settings, in which a slowly moving unconditional volatility is incorporated into a GARCH-type model. The unconditional volatility directly depends on the macroeconomic information in a GARCH model. This article further studies the low-frequency co-volatility in the correlated cross-sectional individuals. Adopting random-effects models, the correlation between the $i$ th and $j$ th log-return is

C o r r (ϵ_{i t}, ϵ_{j t}) = \frac{ω_{i j}}{\sqrt{ω_{i i} ω_{j j}}} \frac{τ_{i j, t}}{\sqrt{τ_{i i, t} τ_{j j, t}}} \frac{1 - γ - δ}{1 - λ \frac{τ_{i j, t}}{\sqrt{τ_{i i, t} τ_{j j, t}}} - θ},

(2.8)

for $i, j = 1, \dots, N$ , $i \neq j$ . Equation (2.8) illustrates that the correlation coefficient is proportional to associated random effects and low-frequency factors. A new presentation of the originally proposed model is necessary to provide inference. Denote the $N \times 1$ vector $r_{t} = (r_{1 t,} r_{2 t}, \dots, r_{Nt})^{'}$ for $t = 1, \dots, T$ . The N-dimensional unexpected returns vector $ε_{t}$ is conditionally normal with zero-mean and the $N \times N$ covariance matrix $Σ_{t}$ having diagonal and off-diagonal elements (2.4) and (2.6), respectively.

The representation of unexpected returns in the vector notation is similar to the familiar MGARCH model, while random effects are present in the model specification. To estimate model parameters in frequentist or Bayesian frameworks, we may face two serious challenges: (a) there is no guarantee that the conditional covariance matrix of innovations, $Σ_{t}$ for each $t = 1, \dots, T$ , is PD and (b) there is a large number of parameters which needs to be estimated. The following prerequisites are necessary before we present conditions to guarantee that $Σ_{t}$ is PD, the following prerequisites are necessary. The covariance matrix $Σ_{t}$ can easily be rewritten as

\begin{matrix} Σ_{t} = τ_{t} ⊙ H_{t}, \end{matrix}

(2.9)

where $⊙$ denotes the Hadamard product of two matrices, $τ_{t}$ is the low-frequency volatility matrix with elements $τ_{ij, t}$ , and $H_{t} = {h_{ij, t}}$ . If $τ_{t}$ and $H_{t}$ become positive semi-definite matrices, then so is $Σ_{t}$ . The model can be represented in the Vech Matrix-Diagonal form to guarantee the positive definiteness of the conditional covariance matrix. Rewrite $H_{t}$ in the form

\begin{matrix} H_{t} = {CC}^{'} + {AA}^{'} ⊙ (ϵ_{t - 1} ϵ_{t - 1}^{'} ø τ_{t - 1} τ_{t - 1}^{'}) + {BB}^{'} ⊙ H_{t - 1}, \end{matrix}

(2.10)

where $ø$ denotes Hadamard division, and $C$ , $A$ and $B$ denote $N \times N$ parameter matrices with either symmetric or the lower triangular form. Let ${CC}^{'} = W = {ω_{ij}}$ ,

{AA}^{'} = (γ - λ) I_{N} + λ J_{N}, {BB}^{'} = (δ - θ) I_{N} + θ J_{N},

(2.11)

where $I_{N}$ denotes the $N \times N$ identity matrix and $J_{N} = 1_{N} 1_{N}^{'}$ is a square unit matrix with $1_{N} = (1, 1, \dots, 1)^{'}$ . The general structure for matrices ${AA}^{'}$ , ${BB}^{'}$ , and ${CC}^{'}$ imposes $3 N (N + 1) / 2$ non-redundant parameters to the model. A simple Diagonal-Vech representation, proposed in the literature, is to restrict these coefficient matrices to diagonal. Also, the number of parameters is dramatically reduced because each conditional covariance depends only on its past values and innovations, and no interaction is included between conditional variances and covariances (Bollerslev et al.(1988)). While several structures are available, we examine compound-symmetry representation for autoregressive coefficients instead. It is a reasonable structure in the variance-covariance analysis, which incorporates homogeneous effects via $γ$ and $δ$ for the variance equation and $λ$ and $θ$ for the covariance equation. Also, we let the covariance matrix be heterogeneous by setting the elements of $W$ to follow a random-intercept model. This strategy sensibly reduces the number of coefficients, and heterogeneity in $W$ justifies variability. As seen later, it helps us to handle computational difficulties in empirical studies. Operating preceding conditions for all parameters, the matrix $Σ_{t}$ will be PD for all $t$ if the initial covariance matrix $Σ_{1}$ is PD.

Now, we present a practical trick to confirm the positive definiteness of the low-frequency matrix. Assume that

\begin{matrix} τ_{t} = Q_{t} F_{t}^{2} Q_{t}^{'}, \end{matrix}

(2.12)

for $t = 1, \dots, T$ , where $F_{t}$ denotes a diagonal matrix of positive structural volatility elements and $Q_{t}$ is the lower triangular matrix of co-volatility components with unit diagonal elements and other elements determined by P-spline functions. Equation (2.12) leads to the modified Cholesky decomposition (Pourahmadi (2007)) for modelling low-frequency volatility.

Denote $F_{t} = diag (f_{1} (t), \dots, f_{N} (t))$ and $g_{t} = (g_{1} (t), \dots, g_{P} (t))^{'}$ as a vector of order $P = N \times (N - 1) / 2$ , which contains the lower-triangular elements of $Q_{t} = {q_{ij, t}}$ stacked by its rows, where $q_{ij, t} = g_{p} (t)$ for $p = i (i - 1) / 2 - (i - j - 1)$ . This novel formulation reduces the multivariate dynamics to a univariate volatility process and guarantees the PD of $τ_{t}$ .

We use a penalized splines technique, so-called P-splines (Ruppert et al.(2003); Crainiceanu et al.(2005)). Let two functions $f_{i}$ and $g_{j}$ be deterministic low-rank regression spline functions. These functions can be derived by dividing the domain of each time point $t$ into continuous intervals and representing the unknown functions $f_{i}$ and $g_{j}$ by a separate function in each interval. Also, each function should join smoothly at the fixed knots $κ_{k}$ for $k = 1, \dots, K$ and $ϰ_{s}$ for $s = 1, \dots, S$ . Thus, the low-rank thin-plate splines must be continuous on the whole interval $[1, T]$ . For fixed knots, we have

f_{i} (t ∣ {\{κ_{k}\}}_{k = 1}^{K}) = ϕ_{i 0} + ϕ_{i 1} t + \sum_{k = 1}^{K} v_{i k} {|t - κ_{k}|}^{3},

(2.13)

g_{j} (t ∣ {\{ϰ_{s}\}}_{s = 1}^{S}) = ψ_{j_{0}} + ψ_{j 1} t + \sum_{s = 1}^{S} u_{j s} {|t - ϰ_{s}|}^{3},

(2.14)

where the regression coefficients $V_{k} = (v_{1 k}, \dots, v_{Nk})^{'}$ and $U_{s} = (u_{1 s}, \dots, u_{Ps})^{'}$ represent the change in gradient between consecutive line segments. Smooth estimates of $f_{i}$ and $g_{j}$ may be obtained by allowing $v_{ik}$ and $u_{js}$ be random coefficients, independently distributed as $v_{i k} \overset{i i d}{\sim} N (0, σ_{v}^{2})$ and $u_{j s} \overset{i i d}{\sim} N (0, σ_{u}^{2})$ (Gurrin et al.(2005)). Furthermore, ${κ_{1} = 0 < κ_{2} < \dots < κ_{K} = T}$ and ${ϰ_{1} = 0 < ϰ_{2} < \dots < ϰ_{s} = T}$ , respectively, represent partitions of the time horizon $T$ into $K$ and $S$ equally spaced intervals. Values $K$ and $S$ denote the number of knots in the spline mixed-effects model of associated variances and covariances. The analyst can employ some information criteria to determine $K$ and $S$ optimal choices, which govern the cyclical pattern in the low-frequency volatility trend. Large values of $K$ and $S$ provide more frequent cycles. Coefficient vectors govern the sharpness of each cycle $ϕ_{i} = (ϕ_{i 0}, ϕ_{i 1})^{'}$ and $ψ_{j} = (ψ_{j 0}, ψ_{j 1})^{'}$ . Following Ruppert (2002), one can consider the number of knots large enough to ensure the desired flexibility, where $K$ and $S$ are the sample quantiles of time $t$ corresponding to probabilities $k / (K + 1)$ and $s / (S + 1)$ for all $k$ and $s$ . However, other choices of knots are also available. For (2.13), the basis function set is ${1, t, | t - κ_{1} |^{3}, \dots, | t - κ_{K} |^{3}}$ , since any low-rank thin-plate splines with the given knots is a linear combination of this set, while for (2.14) the basis set includes $\{1, t, {|t - ϰ_{1}|}^{3}, \dots, {|t - ϰ_{S}|}^{3}\}$ .

Smoothing splines require a certain background to use suitable spline functions. In practical applications, regression splines yield similar estimates with fewer knots than smoothing splines, particularly when estimated by discrete penalized techniques. Various splines can be formed by altering the choice of knots and changing how roughness in the estimated regression function is penalized. An approach allows a knot at each value of the variable, which, given an appropriate choice of roughness penalty, leads to a natural cubic smoothing spline. It is necessary for fitting regression splines to carefully choose knot locations and basis functions to have subjectivity in the model fitting process. A greater smoothness can be achieved by shrinking the estimated coefficients towards zero.

Let $X_{t} = {1, t}$ , $Z_{1 t} = {| t - κ_{1} |^{3}, \dots, | t - κ_{K} |^{3}}$ , and $Z_{2 t} = {| t - ϰ_{1} |^{3}, \dots, | t - ϰ_{S} |^{3}}$ . Non-parametric regression equations $f_{i}$ and $g_{j}$ can be represented by the mixed-effects models

f_{i} (t) = X_{t} ϕ_{i} + Z_{1 t} V_{k},

(2.15)

g_{j} (t) = X_{t} ψ_{j} + Z_{2 t} U_{s},

(2.16)

for $i = 1, \dots, N$ and $j = 1, \dots, P$ . The first terms in (2.15) and (2.16) specify fixed effects, and the second terms smooth unknown functions $f_{i}$ and $g_{j}$ , respectively. This modelling strategy imposes estimating a large number of unknown parameters, especially when the number of assets increases. It may render low-frequency volatility models to be unidentifiable. To handle these issues in empirical studies, we consider assumptions for $ϕ_{i}$ and $ψ_{j}$ . In modelling $f_{i}$ and $g_{j}$ , we let the following forms of parameter heterogeneity:

Homogeneous intercepts and slopes: Coefficient vectors $ϕ_{i}$ and $ψ_{j}$ reduce to $ϕ$ and $ψ$ , restricting common intercepts and slopes for assets.

Heterogeneous intercepts and homogeneous slopes: Coefficient vectors $ϕ_{i}$ and $ψ_{j}$ reduce to $(ϕ_{i 0}, ϕ_{1})^{'}$ and $(ψ_{j 0}, ψ_{1})^{'}$ , such that $ϕ_{i 0}$ and $ψ_{j 0}$ for $i = 1, \dots, N$ and $j = 1, \dots, P$ are individual random effects that follow known distributions $G_{ϕ} (\cdot)$ and $G_{ψ} (\cdot)$ , respectively. The low-frequency volatility model for variance and covariance components includes common slopes.

Both intercepts and slopes heterogeneous: Let coefficient vectors $ϕ_{i}$ and $ψ_{j}$ for $i = 1, \dots, N$ and $j = 1, \dots, P$ be random effects that follow distributions $G_{ϕ} (\cdot)$ and $G_{ψ} (\cdot)$ .

Since the proposed model contains many latent variables, it complicates the estimation process and enforces advanced numerical approaches. The Hamiltonian Monte Carlo is a useful Bayesian computing tool to deal with such issues.

3 Bayesian computational aspects

Bayesian computational tools are convenient to fit complex financial models that appear in hierarchical representations. Bayesian inference requires the joint posterior distribution of all unknown model parameters. The conditional density of $ϵ_{t}$ is given by

\begin{matrix} π (ε_{t}| Σ_{t}) = (2 π)^{- \frac{N}{2}} {|Σ_{t}|}^{- \frac{1}{2}} exp (- \frac{1}{2} ε_{t}^{'} Σ_{t}^{- 1} ε_{t}) . \end{matrix}

(3.1)

For the complete dataset, the conditional likelihood function is $π (ϵ| Σ_{1}, \dots, Σ_{T}) = \prod_{t = 1}^{T} π (ϵ_{t}| Σ_{t}),$ where matrix $ϵ = (ϵ_{1}, \dots, ϵ_{T})$ . As (2.5) and (2.7) involve some restrictions, the Bayesian implementation requires assigning priors of bounded supports.

3.1 Prior specification for the high-frequency volatility model

The Bayesian analysis necessities to determine prior distributions for parameters under conditions $ω_{ii} > 0$ , $γ + δ < 1$ , and $λ + θ < 1$ . These are sufficient for the conditional variance and covariance processes to converge to some fixed numbers or a certain fixed PD matrix. Furthermore, the PD of ${AA}^{'}$ , ${BB}^{'}$ , and ${CC}^{'}$ in (2.11) is sufficient for $H_{t}$ ’s positive semi-definiteness. We notice $γ - λ \geq 0$ and $δ - θ \geq 0$ guarantee the positive semi-definiteness of ${AA}^{'}$ and ${BB}^{'}$ , respectively. Thus, algebraic operations directly assign joint priors with bounded support $0 < λ \leq γ < 1 - δ \leq 1 - θ$ or $0 < θ \leq δ < 1 - γ \leq 1 - λ$ . However, such priors are inapplicable unless we transform constraints into simple inequalities. As detailed later in the data analysis section, we recommend that each parameter be uniformly distributed with restriction to the associated stationary regions.

To study the impact of heterogeneity on the variance and covariance equations, suppose that random effects matrix, $W = {ω_{ij}}$ , follows the Wishart distribution with $N \times N$ scale matrix $S_{ω}$ and a known degrees-of-freedom $N$ . This assumption confirms the PD of $H_{t}$ in (2.10). Moreover, we adopt a compound symmetry structure for $S_{ω}$ , with diagonals $σ_{a}^{2} + σ_{b}^{2}$ and off-diagonal elements $σ_{b}^{2}$ , such that $σ_{a}$ and $σ_{b}$ follow half-Cauchy $(0, 5)$ , see Gelman (2006).

3.2 Prior specification for the low-frequency volatility model

We adopt $v_{i k} \overset{i i d}{~} N (0, σ_{v}^{2})$ , $k = 1, \dots, K$ , $i = 1, \dots, N$ , and $σ_{v} \sim$ half-Cauchy $(0, 5)$ , similarly $u_{j s} \overset{i i d}{~} N (0, σ_{u}^{2})$ for $s = 1, \dots, S$ , $j = 1, \dots, P$ , and $σ_{u} \sim$ half-Cauchy $(0, 5)$ , as proposed by Gelman (2006) and Polson and Scott (2012). To allow intercepts heterogeneity and slopes homogeneity, let each coefficient of the low-rank spline functions follow independently normal distribution, that is, $ϕ_{i 0} \overset{i i d}{~} N (0, σ_{ϕ}^{2})$ , $σ_{ϕ} \sim$ half-Cauchy $(0, 5)$ , $ϕ_{1} \sim N (0, 1)$ , $ψ_{j 0} \overset{i i d}{~} N (0, σ_{ψ}^{2})$ , $σ_{ψ} \sim$ half-Cauchy $(0, 2.5)$ and $ψ_{1} \sim N (0, 1)$ (Ruppert et al.(2003); Durban and Currie (2003); Zhang et al.(1998)).

3.3 Parameter estimation via Hamiltonian Monte Carlo

Hamiltonian Monte Carlo is an effective computational approach (Duane et al.(1987); Neal (1994)) when dealing with complex modelling problems. It uses an approximate Hamiltonian dynamics simulation based on derivatives of density functions to generate an efficient sample from posterior density functions. Neal (2011) and Betancourt and Girolami (2015) detailed some theoretical and practical topics.

Let $ϑ$ be the vector of all parameters in (2.9) and $ϕ$ be a vector of auxiliary momentum variables, independent of $ϑ$ , which jointly drawn from $π (ϑ, ϕ) = π (ϕ| ϑ)$ $π (ϑ)$ . The $ϕ$ is commonly assumed to follow $N (0, Σ_{ϕ})$ . The covariance matrix $Σ_{ϕ}$ plays the role of Euclidean metric to rotate and scale the target density, which may be set to the identity matrix, estimated from warm-up draws, or restricted to a diagonal matrix $σ_{ϕ}^{2} I$ (Betancourt and Stein (2011)). The total energy or Hamiltonian function is

\begin{matrix} H (ϑ, ϕ) & = & - log π (ϑ, ϕ) \\ = & - log π (ϕ| ϑ) - log π (ϑ) notag \\ = & F (ϕ| ϑ) + L (ϑ), notag \end{matrix}

(3.2)

where $F (ϕ| ϑ) = - log π (ϕ| ϑ)$ denotes the kinetic energy and $L (ϑ) = - log π (ϑ)$ is called potential energy. For the independent case $π (ϕ| ϑ) = π (ϕ)$ , the kinetic energy equals $ϕ Σ_{ϕ}^{- 1}$ $ϕ^{'}$ and the joint density reduces to

π (ϑ, ϕ) \propto π (ϑ) \times exp (- φ Σ_{φ}^{- 1} φ^{'}) \propto exp (- H (ϑ, φ)) .

For the continuous time $t$ , the new momentum is evolved via the following Hamilton equations

\frac{\partial ϑ}{\partial t} = \frac{\partial H (ϑ, ϕ)}{\partial ϕ} = \frac{\partial F (ϕ | ϑ)}{\partial ϕ} = Σ_{ϕ}^{- 1} ϕ,

(3.3)

\frac{\partial ϕ}{\partial t} = - \frac{\partial H (ϑ, ϕ)}{\partial ϑ} = - \frac{\partial L (ϑ)}{\partial ϑ} = - \nabla_{ϑ} L (ϑ),

(3.4)

where $\nabla_{ϑ} L (ϑ)$ is the gradient of $L (ϑ)$ for $ϑ$ . Performing the simulation process demands a vector of auxiliary momentum variables. Then, the first derivative computation of the density function proceeds for unknown quantities to generate an efficient exploration of the posterior distribution. Advanced numerical algorithms are needed to solve (3.3) and (3.3). The HMC benefits from a numerical integration algorithm, so-called the leapfrog or Stormer-Verlet integrator, to produce reliable solutions of Hamiltonian equations (Leimkuhler and Reich (2004)). The leapfrog algorithm discretizes a short time interval of $ε$ and starts drawing a new momentum sample of the parameter or the previous momentum value. The Hamiltonian dynamics alternates half-step updates of the momentum and full-step updates of the position as the following steps

\begin{matrix} φ & \leftarrow & φ - \frac{ε}{2} \nabla_{ϑ} L (ϑ), \\ ϑ & \leftarrow & ϑ + ε Σ_{φ}^{- 1} φ, \\ φ & \leftarrow & φ - \frac{ε}{2} \nabla_{ϑ} L (ϑ), \end{matrix}

for some small step-size $ε > 0$ . These steps repeat $m$ times, and simulation results $(ϑ^{*}, ϕ^{*})$ are obtained from a total of $m \times ε$ simulations. In practice, the Metropolis-acceptance step keeps the proposed $(ϑ^{*}, ϕ^{*})$ with probability

min [1, exp (H (ϑ, φ) - H (ϑ^{*}, φ^{*})],

otherwise, the previous parameter value is returned for the next draw and used as an initial value for the next iteration. A summary of the HMC algorithm is as follows

initialize parameters, set initial values $ϑ^{(0)}$ and let $l = 1$ ,

generate $φ^{*} \sim N (0, I)$ and $u \sim U (0, 1),$

set $(ϑ, φ) = (ϑ^{(l - 1)}, φ^{*})$ and $H = H (ϑ, φ),$

repeat the leapfrog solution $m$ times

$φ^{*} = φ^{*} - \frac{ε}{2} \nabla_{ϑ}$ $L (ϑ^{(l - 1)}),$

$ϑ^{(l - 1)} = ϑ^{(l - 1)} + ε$ $φ^{*},$

$φ^{*} = φ^{*} - \frac{ε}{2} \nabla_{ϑ}$ $L (ϑ^{(l - 1)}),$

set $(ϑ^{*}, φ^{*}) = (ϑ^{(l - 1)}, φ^{*})$ and $H^{*} = H (ϑ^{*}, φ^{*}),$

compute $ρ = min [1, exp (H - H^{*})],$

set $ϑ^{(l)} = ϑ^{*}$ if $ρ > u$ , otherwise set $ϑ^{(l)} = ϑ,$

let $l = l + 1$ and return to step 2 until convergence.

To perform data analysis of the empirical studies, we employ Stan’s probabilistic programming language, which uses the HMC algorithm. A Stan code is initially compiled to a C++ programme by the facility of Stan compiler, the so-called stanc. Next, a self-contained platform-specific executable recompiles the code. We use the RStan library, the R interface to Stan, see Carpenter et al.(2017), and ‘GitHub, https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started’. RStan shares initialization, sampling, tuning controls and approximation to provide the posterior analysis.

3.4 Evaluating the predictive accuracy

The posterior predictive distribution (PPD) has often been used to check model performance. For $N$ assets and total time $T$ , let $ϑ$ denote the parameter vector and $ε$ the observed log-return. The posterior density of $ϑ$ , given $ϵ$ , measures the uncertainty of estimates. Let $ϵ_{new}$ be a vector of unobserved log-returns at a forthcoming time. The PPD is computed by

\begin{matrix} π (ϵ_{n e w} ∣ ϵ) = \int_{ϑ} π (ϵ_{n e w} ∣ ϑ, ϵ) π (ϑ ∣ ϵ) d ϑ \\ \approx \sum_{l = 1}^{L} L^{- 1} π (ϵ_{n e w} ∣ ϑ^{(l)}, ϵ) \end{matrix}

where $L$ is the number of essential Monte Carlo samples to reach convergence (Gelman et al.(2013)).

3.5 Optimization-based inference

Denoting $π (ϑ| ε)$ the posterior density, the maximum a posteriori (MAP) estimate is derived by

\begin{matrix} {\hat{ϑ}}_{MAP} & = & arg max_{ϑ} (π (ϑ| ϵ)) \\ = & arg max_{ϑ} (π (ϵ| Σ_{1}, \dots, Σ_{T}) \times π (ϑ)), \end{matrix}

where $arg {max}_{ϑ} (\cdot)$ determines the value of $ϑ$ , maximizing the desired function. The maximum likelihood estimation (MLE) and the MAP estimation are closely related. By setting a flat prior over entire values of $ϑ$ , the posterior mode corresponds to the MLE of parameters. In some situations, such as predicting volatility in stock markets, the MAP estimation may produce more accurate results than the MLE. The MAP estimation procedure uses previous information about parameter values of a particular form. Technical details are given by Bassett and Deride (2019).

4 Simulation and empirical application

We first conduct simulation studies to investigate the statistical properties of the time-varying GARCH mixed-effects model and then analyse the stock market dataset to model volatility and co-volatility.

4.1 Simulation experiments

We simulate the unexpected returns, conditioned on latent variables and innovation terms, and generate some proposed time-varying multivariate model paths. Financial time-series data exhibit specific features that should be included in the simulation study. Specifically, the simulated series variability is taken into account by time-varying coefficients. We expect that non-stationary processes with high variation intend extreme peaks. Also, variance components control volatility and co-volatility in the sample paths.

We first conduct a preliminary simulation study to investigate the Bayesian sensitivity analysis. Particularly, to evaluate the priors effect and hyperparameters choice on the posterior distribution, we examine a series of prior distributions, especially for variance components introduced by Gelman (2006), and study convergence by generating parallel chains. We also defined flat priors for all parameters respecting the already mentioned bounded supports. Simulation findings revealed that priors in Section 3 and flat priors produced almost the same estimation results, implying the model fitting was less sensitive to the choice of priors. For fixed hyperparameters in low-frequency equations, results of the simulated experiments were obtained by computing a series of means and mean-squared errors (MSEs) for all parameters of high-frequency equations. We organize several plans for simulations to investigate various properties of our proposed model:

Simulation 1: When the number of individuals varies, the model behaviour may be interesting. We designed some simulation scenarios, comprising dimensions $N = 10$ , $15$ , $20$ and $40$ over length $T = 180$ . After the successful convergence, Table 1 reports our findings, including the parameters estimate and the latent effects with their MSEs. The bias monotonously decreases as $N$ gets large, that is, it decreases or is nearly equal to the true values. Hence, results verify the consistency of estimates for various $N$ .

Table 1

Simulation results for the proposed model with different $N$ and $T = 180$

		$γ$	$λ$	$δ$	$θ$	$σ_{a}$	$σ_{b}$
	True	0.2500	0.1000	0.4000	0.2000	0.1000	0.1000
$N = 10$	Mean	0.2552	0.1410	0.3692	0.1823	0.2238	0.0552
	MSE	0.0012	0.0044	0.0046	0.0109	0.0159	0.0033
$N = 15$	Mean	0.2734	0.1138	0.4008	0.1968	0.1012	0.0901
	MSE	0.0013	0.0012	0.0023	0.0095	0.0001	0.0005
$N = 20$	Mean	0.2239	0.1017	0.3670	0.1249	0.1123	0.1094
	MSE	0.0012	0.0005	0.0036	0.0122	0.0002	0.0005
$N = 40$	Mean	0.2524	0.0955	0.3859	0.1665	0.0891	0.1403
	MSE	0.0002	0.0002	0.0012	0.0120	0.0002	0.0019

Simulation 2: The model assessment was performed for four simulation experiments for $N = 10$ and $T = 180$ . For all series in Table 2, we observed that the estimates were close to their true values with MSEs near to zero. Thus, results confirm that our proposed model performs satisfactorily in terms of unbiasedness and precision.

Table 2

Simulation results for the proposed model with $N = 10$ and $T = 180$

		$γ$	$λ$	$δ$	$θ$	$σ_{a}$	$σ_{b}$
Series I	True	0.5000	0.2000	0.4000	0.1000	0.2000	0.2000
	Mean	0.4893	0.2252	0.4152	0.1268	0.2134	0.2511
	MSE	0.0019	0.0043	0.0013	0.0074	0.0012	0.0086
Series II	True	0.2500	0.1000	0.2500	0.1000	0.1000	0.1000
	Mean	0.2351	0.1382	0.2655	0.1308	0.1038	0.0913
	MSE	0.0015	0.0041	0.0078	0.0089	0.0002	0.0007
Series III	True	0.2500	0.1000	0.4000	0.2000	0.1000	0.1000
	Mean	0.2552	0.1410	0.3692	0.1823	0.2238	0.0552
	MSE	0.0012	0.0044	0.0046	0.0109	0.0159	0.0033
Series IV	True	0.5000	0.1000	0.2500	0.2000	0.2000	0.3000
	Mean	0.5751	0.0932	0.2549	0.1755	0.1607	0.2887
	MSE	0.0079	0.0022	0.0017	0.0051	0.0026	0.0056

Simulation 3: To compare the proposed model with some existing multivariate GARCH models, we conduct an additional simulation study. We first generate $N = 10$ vectors of unexpected returns from the time-varying GARCH mixed-effects model $ϵ_{t} = Σ_{t}^{1 / 2} e_{t}$ with $T = 180$ , and the covariance matrix $Σ_{t}$ is given by (2.9) and the innovation term $e_{t}$ follows the contaminated-normal distribution ${CN}_{N} (0, I, c_{1}, c_{2})$ , where $c_{1}$ and $c_{2}$ respectively determine the contamination probability and the degree of contamination (Arellano-Valle et al.(2018)). Set $γ = 0.25$ , $λ = 0.1$ , $δ = 0.25$ , $θ = 0.1$ , $σ_{ϕ} = 0.1$ , $σ_{ψ} = 0.25$ , $σ_{u} = 0.25$ , $σ_{v} = 0.1$ , $σ_{a} = 0.1$ , $σ_{b} = 0.1$ , $ϕ_{1} = 0.1$ and $ψ_{1} = 0.1$ . We also assume $c_{1} \sim Uniform (0, 1)$ and $c_{2} \sim Uniform (1, 5)$ . Next, we fitted the following competing multivariate models to the generated samples:

M1: The proposed model in Section 2.

M2: The proposed model with eliminating co-volatilities by assuming $Σ_{t} = diag (τ_{ii, t} h_{ii, t})$ , where $h_{ii, t}$ is defined in (2.5) such that $ω_{ii} \sim Gamma (s / 3, s / 3)$ , $s \sim Inv - Gamma (3, 3)$ , and $τ_{ii, t} = \exp (f_{i} (t))$ , with $f_{i} (t)$ in (2.14).

M3: An extension of the univariate spline GARCH model proposed by Engle and Rangel (2008) to the multivariate case; for which, $Σ_{t} = diag (τ_{ii, t} h_{ii, t})$ , where $h_{ii, t} = (1 - γ - δ) + γ (ϵ_{i, t - 1}^{2} / τ_{ii, t - 1}) + δ h_{ii, t - 1}$ , and $τ_{ii, t} = exp (ϕ_{i 0} + ϕ_{i 1} t + \sum_{k = 1}^{K} v_{ik} (t - κ_{k})_{+}^{2}$ , the sharpness of each cycle is govern by fixing coefficients $ϕ_{i 0}$ and $ϕ_{i 1}$ with defining the basis function $(t - κ_{k})_{+} = (t - κ_{k})$ if $t > κ_{k}$ and $0$ otherwise. $κ_{k}$ is the sample quantiles of time $t$ corresponded to probabilities $k / (K + 1)$ for $k = 1, \dots, K$ . We set $K = 7$ . The difference between models M2 and M3 is that for M3, variance and covariance equations are homogeneous, and quadratic spline basis functions are used.

M4: The Matrix-Diagonal model proposed by Ding and Engle (2001), with $Σ_{t} = {CC}^{'} + {AA}^{'} ⊙ ϵ_{t - 1} ϵ_{t - 1}^{'} + {BB}^{'} ⊙ Σ_{t - 1}$ , where ${CC}^{'}$ , ${AA}^{'}$ and ${BB}^{'}$ are all positive semi-definite, implying the PD of $Σ_{t}$ , for all $t > 1$ provided that the initial covariance matrix $Σ_{1}$ being PD. We adopt $Σ_{t}$ to be compound symmetry and set ${CC}^{'} = (ω_{1} - ω_{2}) I_{N} + ω_{2} J_{N}$ , ${AA}^{'} = (γ - λ) I_{N} + λ J_{N}$ and ${BB}^{'} = (δ - θ) I_{N} + θ J_{N}$ . Also, fixing $ω_{1}$ , $γ$ and $δ$ for the variance and $ω_{2}$ , $λ$ and $θ$ makes the covariance matrix homogeneous.

M5: The heterogeneous multivariate GARCH model (Cermeño and Grier (2006)). Heterogeneity in $W$ justifies variability and practicality. The model differs from M4 since the covariance matrix lets to be heterogeneous by setting ${CC}^{'}$ compound symmetry. In contrast, we assume ${CC}^{'} = W$ follows the Wishart distribution, which guarantees the $Σ_{t}$ to be PD for all $t$ as the $Σ_{1}$ being PD.

Existing MGARCH models have specific features, while some, such as M4 and M5, can not capture high- and low-frequency volatilities. The routine practice uses information criteria for models choice. First, let $ϑ$ be a $k \times 1$ vector of all parameters estimated in a fitted model and the deviance $D (ϑ) = - 2 log (π (ϑ | ϵ))$ . The most used criteria are (a) Akaike information criterion, $AIC (ϑ) = D (ϑ) + 2 k$ , (b) Bayesian information criterion, $BIC (ϑ) = D (ϑ) + log (T) k$ , (c) Deviance information criterion, $DIC (ϑ) = - 2 lpd (ϑ) + 2 k_{DIC}$ , where $lpd (ϑ)$ denotes the log-predictive density of the data given a point estimate of the fitted model (Spiegelhalter et al.(2002); Gelman et al.(2014)), $lpd (ϑ) = log π (ϵ | ϑ)$ , and $k_{DIC} = 2 {log π (ϵ | ϑ) - E_{post} (log π (ϵ | ϑ))}$ is the penalty function, (d) Watanabe-Akaike information criterion, $WAIC (ϑ) = - 2 lppd (ϑ) + 2 k_{WAIC}$ , where the log point-wise posterior predictive density $lppd (ϑ) = \sum_{t = 1}^{T} log \int_{ϑ} π (ϰ_{t} | ϑ, ϵ_{t - 1}) π_{post} (ϑ) d ϑ$ , and $k_{WAIC} = 2 \sum_{t = 1}^{T} {log E_{post} (π (ϵ_{t} | ϑ, ϵ_{t - 1})) - E_{post} (log π (ϵ_{t} | ϑ, ϵ_{t - 1})}$ . WAIC estimate the out-of-sample prediction accuracy from a fitted Bayesian model using the log-likelihood evaluated at the posterior simulations of the parameter values (Watanabe and Opper (2010); Vehtari et al.(2017)).

Table 3

Information criteria for fitted models

	M1	M2	M3	M4	M5	True Model
AIC	$- 323.80$	$- 304.06$	$- 256.45$	$- 160.38$	$- 233.01$	$- 354.40$
BIC	$- 320.49$	$- 302.27$	$- 250.32$	$- 158.59$	$- 231.22$	$- 346.57$
DIC	$- 1115.58$	$- 1071.20$	$- 1034.44$	$- 628.68$	$- 825.23$	$- 1264.52$
WAIC	$- 1169.45$	$- 1063.92$	$- 1023.60$	$- 616.92$	$- 812.24$	$- 1293.70$

For each model, we ran Stan codes until successful convergence. Table 3 shows that the proposed model M1 performs satisfactorily compared to competing models since it has the least difference of criteria values with the true model. Notice that the data generating process was upon the contaminated normal distribution that involves outlying points. Thus, our results generally reveal that the fitted models involving cross-covariance, especially those accommodating more flexibility and variability through random effects, could deal with contaminated data, implying robustness to possible outlying points. The worst is M4 since it does not account for heterogeneity and volatility or co-volatility isolation.

4.2 A study on volatility and co-volatility in 10 top US stock indexes

Stock market data analysis has often been desirable in economic and financial studies. As emphasized, the long-term and short-term changes in a stock price cause significant effects on other stocks and macroeconomic factors. Modern regression volatility models, such as widely used GARCH, are available to investigate the impact of such effects. This section analyses the weekly log-returns of the 10 top stock indices from the Consumer Staples sector. Our study contains Walmart (WMT), Procter and Gamble (PG), Coca-Cola (KO), Philip Morris International (PM), Altria Group (MO), Estee Lauder (EL) and Colgate-Palmolive (CL) from the New York Stock Exchange (NYSE), and PepsiCo (PEP), Costco (COST) and Mondelez International (MDLZ) from the National Association of Securities Dealers Automated Quotations exchange (NASDAQ). The data cover 110 weeks from 6 November, 2017 to 2 December, 2019, and is freely available on https://finance.yahoo.com. We aim to model high- and low-frequency volatility for each stock index appropriately and study the effect of changing one stock price on the other stocks. Figure 1 presents the time-series plots of the log-returns of 10 stocks. None of the stock datasets follows the stationary process.

Figure 1

Time-series plots for the log-returns of 10 stocks

Now, we assess the proposed model performance in empirical applications compared to current models for cross-correlated financial data. It is instructive to signify model's ability to isolate high- and low-frequency volatilities. Estimation results, shown in Table 4, reveal that our model is the best fitting. Although the heterogeneous multivariate GARCH model fits fairly satisfactorily, it cannot isolate frequencies, unlike our model.

Table 4

Model comparison for fitted to stock market data

	M1	M2	M3	M4	M5
AIC	$- 3035.67$	$- 2822.87$	$- 2777.78$	$- 2935.41$	$- 3023.49$
BIC	$- 3035.19$	$- 2822.61$	$- 2776.88$	$- 2935.14$	$- 3023.23$
DIC	$- 10873.65$	$- 10161.52$	$- 10141.59$	$- 10633.13$	$- 10830.17$
WAIC	$- 10771.08$	$- 10153.55$	$- 10145.61$	$- 9091.47$	$- 10839.71$

As mentioned in Section 3.3, the RStan library in R utilizes the HMC simulation approach. Stan provides the mode of posterior or the MAP estimate. The Newton-Raphson algorithm is the default optimization of Stan (Carpenter et al.(2017)). Also, by adopting non-informative priors, the MAP estimates are equivalent to the MLEs. Table 5 presents the estimation results for the given period.

Table 5

Bayesian results, standard deviations, MLE and MAP estimates, and rhat values

Parameters	Mean	sd	MLE	MAP	rhat
$γ$	0.0995	0.0344	0.1473	0.1090	1.0018
$λ$	0.0220	0.0200	0.0269	0.0111	1.0017
$δ$	0.2520	0.1311	0.2067	0.1777	1.0004
$θ$	0.1569	0.1436	0.2067	0.0571	1.0004
$σ_{ϕ}$	0.0728	0.0411	0.1871	0.0428	1.0003
$σ_{ψ}$	0.4171	0.0944	0.4211	0.4001	0.9999
$σ_{u}$	0.2576	0.1217	0.2347	0.1785	1.0002
$σ_{v}$	0.0953	0.0681	0.3273	0.0666	1.0220
$σ_{a}$	0.0163	0.0057	0.0100	0.0125	1.0002
$σ_{b}$	0.0416	0.0209	0.0152	0.0277	1.0041
$ϕ_{1}$	0.0013	0.0700	0.5149	0.0010	1.0000
$ψ_{1}$	0.3549	0.1364	0.2138	0.3571	1.0551
$lp$ –	5016.7250	148.9654	–	–	0.9999

The random-effects $ω_{ij}$ , $i, j = 1, \dots, N$ , deliver the heterogeneity in the variance and covariance equations. We have a special concern about their features in our analysis. The matrix of random effects is commonly supposed to follow the Wishart distribution. This traditional case plays an important role in the multivariate analysis, connected to inference about covariance matrices. Our modelling methodology uses a practical algorithm introduced by Odell and Feiveson (1966) to derive an $N$ -dimensional Wishart distribution with the degrees-of-freedom $N$ and the scale matrix $S_{w}$ . In the Bayesian contexts, $S_{w}$ interprets uncertainty about $W$ , $S_{w}$ evaluates the deviation or confusion, and $N$ interprets a degree of belief in $S_{w}$ . To operationalize the computation process, let $S_{w}$ be compound symmetry with parameters $σ_{a}^{2}$ and $σ_{b}^{2}$ , which allows for setting up a specific dispersion of modelling heterogeneity in the variance and the same dispersion for the heterogeneity in modelling the covariance structure. We have $E (W | S_{w}, N) = N S_{w}$ , implying that $E (W_{ii}) = N (σ_{a}^{2} + σ_{b}^{2})$ and $E (W_{ij}) = N σ_{b}^{2}$ , for all $i$ and $j$ . From Table 5, both parameters $σ_{a}^{2}$ and $σ_{b}^{2}$ are significant, confirming the stock-index heterogeneity. It means that the analyst should consider the effect of volatility in stock on other indices.

A convergence diagnostic in Stan is the rhat. It is close to one in Table 5, showing that all chains have converged to the same stationary distribution. Stan uses a more conservative version of rhat than its usual form in other packages, such as Coda (Plummer et al.(2006)). By default, it divides each chain in half to identify non-stationary chains (Gelman et al.(2013); Carpenter et al.(2017)). Trajectories of all parameters estimated over simulations and changes within a reasonable tolerance level (not shown here) showed a well-mixed chain. Figure 2 shows the posterior predictive histogram.

Figure 2

Predictive posterior histograms

The individual high-frequency volatility $h_{ii, t}$ is modelled via GARCH, while the low-frequency volatility, $τ_{ii, t}$ , is modelled via the low-rank thin-plate spline. Asset returns are affected by politics and economic conditions, while unobserved factors can cause volatility. Figures 3 –5, respectively, illustrate the variances $σ_{ii, t}$ , high-frequencies $h_{ii, t}$ and low-frequencies $τ_{ii, t}$ for the log-return of 10 stocks. Over the given period, our results show that the smoothing has correctly depicted the $σ_{ii, t}$ function and $τ_{ii, t}$ properly makes smoothness the covariance function $σ_{ii, t}$ .

Figure 3

Time-varying variance components for the log-return of 10 US stocks

Figure 4

High-frequency volatility in the log-return of 10 US stocks

Figure 5

Low-frequency volatility in the log-return of 10 US stocks ( $10^{3}$ scaled up)

Figure 6

(A) Time-varying covariance and (B) Low-frequency co-volatility of the WMT and 9 other log-returns ( $10^{3}$ scaled up)

To study the co-volatility in the covariance equation, consider the WMT index as an example. Figure 6 illustrates the low-frequency co-volatility and the time-varying covariance for the log-return of Walmart's stock and the nine other stocks. Low-frequency plots display the smoothness of covariances patterns. Figures describe the long-term changes of the WMT when other stocks have presented. Short-term volatility immediately affects the log return, while long-term volatility plays a role in the decision for future trades. The covariance structure is more influenced by long-term than short-term changes for this dataset. As seen here, coefficients in the low-frequency model are significant, unlike in the high-frequency model. It means that the volatility in one stock has a long-term impact on other stocks. When unexpected changes in one stock immediately impact other stocks, there is higher economic instability in the financial markets. Figure 7 presents the temporal heatmap for the time-varying covariance matrix for some selected days over this period.

Figure 7

Temporal heatmap for the time-varying covariance matrix

Forecasting volatility is particularly important for companies concerned with portfolio management, and brokers will trade stocks more confidently. There are various methods to estimate or predict volatility. The accurate estimate of a future market trend confirms the proposed model is appropriately fitted. To evaluate prediction accuracy, we first remove the vector observed at the time $T + 1$ from the dataset, fit the model for the current period, and then estimate the observation vector, $ϵ_{T + 1}$ . The difference between predicted and true values can evaluate the appropriate fitting (Gelman et al.(2013)). Table 6 shows the observed values fall between $25 %$ and $75 %$ of the credible bounds. The $p$ -value for the Chi-squared test equals $0.075$ , implying the difference between observed and predicted values is significant.

Table 6

The observed and predicted bounds of log-return

Name of Stock	$ϵ_{T + 1}$	$25 %$	$75 %$
WMT	0.0060	$- 0.0323$	0.0460
PG	$- 0.0015$	$- 0.0180$	0.0450
KO	0.0018	$- 0.0124$	0.0179
PEP	0.0062	$- 0.0047$	0.0739
PM	0.0045	0.0158	0.2640
COST	0.0027	$- 0.0202$	0.0400
MO	0.0020	0.0064	0.1047
MDIZ	0.0002	$- 0.0120$	0.0691
EL	0.0035	$- 0.0185$	0.0892
CL	0.0026	$- 0.0041$	0.0678

Figure 8

Heatmap of the predicted covariance matrix

Moreover, the multivariate analysis can predict the cross-covariance of stocks involving their latent short- or long-run effects. Figure 8 displays a one-step-ahead forecast of the covariance matrix. Closer pairs of stocks indicate a higher covariance and darker colours over the heatmap. It graphically represents cross-dependence at time $T + 1$ ; the covariance magnitude becomes larger as the colours turn stronger.

5 Concluding remarks

The article introduced a flexible model for describing high- and low-frequency variations with novel data analysis methods. Specifically, we offered modelling strategies for cross-sectional time-series analysis with serially correlated innovation terms and cross-individual dependence. The related conditional variance and covariance components followed the spline-GARCH process. In particular, we proposed a non-parametric regression model of the penalized low-rank smoothing spline form to account for time trends in the variance and covariance equations. Our proposed strategy can provide flexible modelling on the low-frequency volatility in equity markets. The model can differentiate non-parametrically dynamic patterns of high- and low-frequencies of variance-covariance structural equations and incorporate economic features to predict stock market volatility based on the time-series evidence. Many financial and macroeconomic time-series data are often conditionally heteroscedasticity, meaning that traditional estimation methods produce inefficient estimates. To analyse such data, taking advantage of advanced numerical methods is required. The HMC sampler can estimate related variances and covariances components in the normal case using computational Bayes. Future studies aim to present flexible multivariate distributions to analyse such correlated data with skewed structures.

Supplementary materials

Supplementary materials for this article containing software codes are available from http://www.statmod.org/smij/archive.html, and the datasets are freely available from https://finance.yahoo.com.

Footnotes

Acknowledgments

We are grateful to the Editor and two anonymous referees for their constructive comments. We thank the Shiraz University Research Council for supporting this work.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship and/or publication of this article: The authors acknowledge the Iran National Science Foundation for partial financial support (INSF 96012003).

References

Arellano-Valle

, Ferreira

, and Genton

(2018) Scale and shape mixtures of multivariate skew-normal distributions. Journal of Multivariate Analysis , 166, 98–110.

Audrino

, and Bu¨hlmann

(2009) Splines for financial volatility. Journal of the Royal Statistical Society: Series B , 71, 655–670.

Bassett

, and Deride

(2019) Maximum a posteriori estimators as a limit of Bayes estimators. Mathematical Programming , 174, 129–144.

Betancourt

, and Girolami

(2015) Hamiltonian Monte Carlo for hierarchical models. In Current Trends in Bayesian Methodology with Applications , edited by Upandhyay

, Singh

, Dey

, and Loganathan

. Ch. 4, pages 79–100. Oxfordshire: Taylor & Francis Group.

Betancourt

, and Stein

(2011) The geometry of Hamiltonian Monte Carlo . arXiv preprint arXiv:11124118.

Bollerslev

, Engle

, and Wooldridge

(1988) A capital asset pricing model with timevarying covariances. Journal of Political Economy , 96, 116–131.

Carpenter

, Gelman

, Hoffman

, Lee

, Goodrich

, Betancourt

, Brubaker

, Guo

, Li

, and Riddell

(2017) Stan: A probabilistic programming language. Journal of Statistical Software , 76, 1–32.

Cermenño

, and Grier

(2006) Conditional heteroskedasticity and cross-sectional dependence in panel data: An empirical study of inflation uncertainty in the G7 countries. In Panel Data Econometrics: Volume 274 of Contributions to Economic Analysis , edited by edited by Baltagi

. Ch. 10, pages 259–277. Amsterdam: Elsevier.

Crainiceanu

, Ruppert

, and Wand

(2005) Bayesian analysis for penalized spline regression using WinBUGS. Journal of Statistical Software , 14, 1–24.

10.

Ding

, and Engle

(2001) Large scale conditional covariance matrix modeling, estimation and testing (Working Paper No Fin-01029) . New York: NYU.

11.

Duane

, KennedZ

, Pendleton

, and Roweth

(1987) Hybrid Monte Carlo. Physics Letters B , 195, 216–222.

12.

Durban

, and Currie

(2003) A note on Pspline additive models with correlated errors. Computational Statistics , 18, 251–262.

13.

Eilers

PHC

, and Marx

(1996) Flexible smoothing with b-splines and penalties. Statistical Science , 11, 89–102.

14.

Engle

, and Rangel

(2008) The splineGARCH model for low-frequency volatility and its global macroeconomic causes. The Review of Financial Studies , 21, 1187–1222.

15.

Engle

, and Sokalska

(2012) Forecasting intraday volatility in the us equity market multiplicative component GARCH. Journal of Financial Econometrics , 10, 54–83.

16.

Gelman

(2006) Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis , 1, 515–534.

17.

Gelman

, Carlin

, Stern

, Dunson

, Vehtari

, and Rubin

(2013) Bayesian Data Analysis . New York: Chapman & Hall/CRC.

18.

Gelman

, Hwang

, and Vehtari

(2014) Understanding predictive information criteria for Bayesian models. Statistics and Computing , 24, 997–1016.

19.

Gurrin

, Scurrah

, and Hazelton

(2005) Tutorial in biostatistics: Spline smoothing with linear mixed models. Statistics in Medicine , 24, 3361–3381.

20.

Leimkuhler

, and Reich

(2004) Simulating Hamiltonian Dynamics . Cambridge: Cambridge University Press.

21.

Neal

(1994) An improved acceptance procedure for the hybrid Monte Carlo algorithm. Journal of Computational Physics , 111, 194–203.

22.

Neal

(2011) MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo , edited by Brooks

, Gelman

, Jones

and Meng

X-L

. Ch. 5. Boca Raton: CRC Press.

23.

Odell

, and Feiveson

(1966) A numerical procedure to generate a sample covariance matrix. Journal of the American Statistical Association , 61, 199–203.

24.

Plummer

, Best

, Cowles

, and Vines

(2006) CODA: Convergence diagnosis and output analysis for MCMC. R News , 6, 7–11.

25.

Polson

, and Scott

(2012) On the half-Cauchy prior for a global scale parameter. Bayesian Analysis , 7, 887–902.

26.

Pourahmadi

(2007) Cholesky decompositions and estimation of a covariance matrix: Orthogonality of variance–correlation parameters. Biometrika , 94, 1006–1013.

27.

Rangel

, and Engle

(2012) The factor– spline–GARCH model for high and low frequency correlations. Journal of Business & Economic Statistics , 30, 109–124.

28.

Ruppert

(2002) Selecting the number of knots for penalized splines. Journal of Compu-tational and Graphical Statistics , 11, 735–757.

29.

Ruppert

, Wand

, and Carroll

(2003) Semi-parametric Regression . Cambridge: Cambridge University Press.

30.

Speckman

, and Sun

(2003) Fully Bayesian spline smoothing and intrinsic autoregressive priors. Biometrika , 90, 289–302.

31.

Spiegelhalter

, Best

, Carlin

, and Van Der Linde

(2002) Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B , 64, 583–639.

32.

Vehtari

, Gelman

, and Gabry

(2017) Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing , 27, 1413–1432.

33.

Vogt

, and Linton

(2017) Classification of non-parametric regression functions in longitudinal data models. Journal of the Royal Statistical Society: Series B , 79, 5–27.

34.

Wahba

(1990) Spline Models for Observational Data . Philadelphia: SIAM (Society for Industrial and Applied Mathematics).

35.

Wand

(2003) Smoothing and mixed models. Computational Statistics , 18, 223–249.

36.

Watanabe

, and Opper

(2010) Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. Journal of Machine Learning Research , 11, 3571–3594.

37.

Zhang

, Lin

, Raz

, and Sowers

(1998) Semiparametric stochastic mixed models for longitudinal data. Journal of the American Statistical Association , 93, 710–719.

A time-varying GARCH mixed-effects model for isolating high- and low- frequency volatility and co-volatility

Abstract

Keywords

1 Introduction

2 Specification of the proposed model

2.1 Heterogeneous MGARCH model

3.2 Prior specification for the low-frequency volatility model

3.3 Parameter estimation via Hamiltonian Monte Carlo

3.5 Optimization-based inference

4 Simulation and empirical application

4.1 Simulation experiments

Table 1

Simulation results for the proposed model with different N and T = 180

Simulation results for the proposed model with N = 10 and T = 180

Information criteria for fitted models

Figure 1

Time-series plots for the log-returns of 10 stocks

Model comparison for fitted to stock market data

Bayesian results, standard deviations, MLE and MAP estimates, and rhat values

Predictive posterior histograms

Time-varying variance components for the log-return of 10 US stocks

High-frequency volatility in the log-return of 10 US stocks

Low-frequency volatility in the log-return of 10 US stocks ( 10 3 scaled up)

(A) Time-varying covariance and (B) Low-frequency co-volatility of the WMT and 9 other log-returns ( 10 3 scaled up)

Temporal heatmap for the time-varying covariance matrix

The observed and predicted bounds of log-return

Heatmap of the predicted covariance matrix

Supplementary materials

Footnotes

Acknowledgments

Declaration of conflicting interests

Funding

References

Simulation results for the proposed model with different $N$ and $T = 180$

Simulation results for the proposed model with $N = 10$ and $T = 180$

Low-frequency volatility in the log-return of 10 US stocks ( $10^{3}$ scaled up)

(A) Time-varying covariance and (B) Low-frequency co-volatility of the WMT and 9 other log-returns ( $10^{3}$ scaled up)