Sage Journals: Discover world-class research

Abstract

Several researchers have described two-part models with patient-specific stochastic processes for analysing longitudinal semicontinuous data. In theory, such models can offer greater flexibility than the standard two-part model with patient-specific random effects. However, in practice, the high dimensional integrations involved in the marginal likelihood (i.e. integrated over the stochastic processes) significantly complicates model fitting. Thus, non-standard computationally intensive procedures based on simulating the marginal likelihood have so far only been proposed. In this paper, we describe an efficient method of implementation by demonstrating how the high dimensional integrations involved in the marginal likelihood can be computed efficiently. Specifically, by using a property of the multivariate normal distribution and the standard marginal cumulative distribution function identity, we transform the marginal likelihood so that the high dimensional integrations are contained in the cumulative distribution function of a multivariate normal distribution, which can then be efficiently evaluated. Hence, maximum likelihood estimation can be used to obtain parameter estimates and asymptotic standard errors (from the observed information matrix) of model parameters. We describe our proposed efficient implementation procedure for the standard two-part model parameterisation and when it is of interest to directly model the overall marginal mean. The methodology is applied on a psoriatic arthritis data set concerning functional disability.

Keywords

Semicontinuous data two-part models overall marginal mean patient-specific inference serial correlation psoriatic arthritis

1 Introduction

Semicontinuous data arise when the outcome is a mixture of true zeros and continuously distributed positive values.¹ Some examples in the literature have included average daily alcohol consumption,¹ hospital lengths of stay² and medical expenditures.^3,4 In these situations, and more generally, it is natural to view the outcome as a result of two processes, the first determines if the outcome is zero, and if not the second determines the positive value. Two-part models are therefore convenient for the analysis of semicontinuous data and have been used extensively. Recently, Smith et al.^3,4 considered the interesting notion of reparameterising the mean of the positive values in terms of the overall mean, which is arguably a more justified target of inference (see Tom et al.⁵ and the references therein). We also consider this notion with respect to the overall marginal mean in our framework.

Two-part marginal models and two-part mixed models have both been proposed for the analysis of longitudinal semicontinuous data. The first is motivated by obtaining population-based inference and have been constructed using generalized estimating equations.⁶ The second is more convenient when patient-specific inference is of interest and are constructed by incorporating correlated patient-specific random effects in both parts of the model.⁷ This paper focuses on the two-part mixed modelling approach, although considerations are provided on how population-based inference can be obtained.

In some situations, correlated patient-specific random effects models will not provide an adequate fit to the data. This may especially be the case when the lengths of follow-up are relatively long. Here, it may be less plausible to assume that patients can only have consistently high or low outcomes throughout their entire follow-up. In terms of the correlation structure, it may not be reasonable to assume constant correlation between outcomes from the same patient regardless of their gap times (which is induced by patient-specific random effects). Flexible two-part models that allow for random changes in the trajectory through serially correlated stochastic processes may then be more plausible and these have been proposed in the literature. Albert and Shen⁸ and Ghosh and Albert⁹ proposed two-part mixed models that consisted of correlated Gaussian processes and random walks (in addition to correlated patient-specific random effects), respectively, in both parts of the model. Albert and Shen⁸ demonstrated, through their application and a simulation study, that overall conditional means may suffer from bias if serial correlation (which is not captured by patient-specific random effects) is present but ignored. It is also worth noting, both models incorporating stochastic processes provided considerable improvements of fit to their data.

A main drawback of fitting models with stochastic processes is the computationally intensive nature of the model fitting procedure. The primary difficulty results from the following feature: if a patient has m_i observations, then a model consisting of correlated stochastic processes in each part of the model will require $2 m_{i}$ integrations to evaluate the marginal likelihood contribution from that patient (assuming, as is usual, the stochastic processes are realised at the observation times). For manageable values of m_i, Albert and Shen⁸ and Ghosh and Albert⁹ have developed methods based on a Monte Carlo Expectation Maximization algorithm and Markov chain Monte Carlo, respectively, to evaluate the marginal likelihood. Both of these procedures can be computationally intensive, with the former also requiring standard errors of parameter estimates to be computed by bootstrap. The primary aim of this paper is to demonstrate, using a property of the multivariate normal distribution and the standard marginal cumulative distribution function identity, how a marginal likelihood can be obtained in terms of the cumulative distribution function of a multivariate normal distribution. Implicitly, because it is possible to efficiently evaluate the cumulative distribution function of a multivariate normal distribution, maximum likelihood estimation can be used to obtain parameter estimates and (asymptotic) standard errors (from the observed information matrix) of model parameters.

The rest of this paper is organised as follows. In Section 2, the motivating application concerning functional disability in psoriatic arthritis is introduced. Section 3 describes the flexible two-part modelling framework of Albert and Shen⁸ and Ghosh and Albert⁹ (including additional comments regarding implementation). Section 4 proposes an efficient maximum likelihood estimation procedure for the models in Section 3. Section 5 applies the methodology in Section 4 to the data described in Section 2. While retaining the flexibility of using stochastic processes models and the practicality of the proposed efficient implementation procedure, Section 6 extends the modelling framework of Section 3 to allow for the direct modelling of the overall marginal mean. Finally, concluding remarks are made in Section 7.

2 Functional disability in psoriatic arthritis

Psoriatic arthritis (PsA) is an inflammatory arthritis associated with the skin condition psoriasis. Because of both skin and joint involvement of the disease, PsA can result in patients having severe physical functional disability. The dominant measure of functional disability in PsA, as well as in many other disease areas,¹⁰ is the self-reported health assessment questionnaire (HAQ). This produces an essentially continuous measure^11–14 between zero, representing no disability, and three, representing severe disability.

The HAQ scores of 698 patients observed longitudinally at the University of Toronto PsA clinic were considered for this analysis. Figure 1 shows the frequencies of HAQ scores from these patients. From Figure 1, it is evident that a large proportion of zeros exist in this data set (1526/4811 = 0.32). The clumping at zero, together with the continuous distributed outcomes for the non-zero values, suggests that the HAQ score can be viewed as a semicontinuous outcome. Su et al.^12,13 considered two-part models with patient-specific random effects for analysing an earlier version of this PsA data set. In this paper, we relax the assumption of constant patient-specific random effects to patient-specific stochastic processes and consider the extent to which they improve understanding of the disability process. This includes making easy interpretable inference on the overall marginal mean HAQ scores, a concept that has not been considered before with stochastic processes models (see Section 6 for more details). On average, patients had 6.89 clinic visits (ranging from 2 to 20) with mean inter-visit and follow-up times of 1 year and 5 months (standard deviation (SD) of 1 year and 1 month) and 8 years and 3 months (SD of 5 years and 10 months), respectively.

Figure 1.

Frequencies of HAQ scores in our data.

3 Model

Let Y_ij ( $i = 1, \dots, N$ ) denote the semicontinuous response from patient i at time t_ij ( $j = 1, \dots, m_{i}$ ), where t_ij represents the time of the jth observation from patient i. Because of true zeros, it is natural to decompose the response into

U_{i j} = {\begin{matrix} 1 & : Y_{i j} > 0 \\ 0 & : Y_{i j} = 0 \end{matrix}

and

g (Y_{ij}) | Y_{ij} > 0

, where

g (\cdot)

is a monotonic function such that

g (0) = 0

and

g (Y_{ij}) | Y_{ij} > 0

is positive and approximately Gaussian with constant variance

σ^{2}

. For convenience, the model for U_ij is referred to as the binary component, while the model for

g (Y_{ij}) | Y_{ij} > 0

is referred to as the continuous component.

We now describe the flexible modelling framework. Let $X_{ij}$ and $Z_{ij}$ be column vectors of covariates that influence the probability of Y_ij > 0 and the mean of $g (Y_{ij}) | Y_{ij} > 0$ , respectively. Then conditional on correlated patient-specific random effects $(B_{i}^{r} = b_{i}^{r}, C_{i}^{r} = c_{i}^{r})$ and correlated stochastic processes ${B_{i}^{s} (t_{ij}) = b_{ij}^{s}, C_{i}^{s} (t_{ij}) = c_{ij}^{s}}$ , where the random effects are assumed independent of the stochastic processes, we model U_ij as Bernoulli with response probability

ℙ (U_{ij} = 1 | b_{ij}^{s}, b_{i}^{r}) = Φ (X_{ij}^{⊤} β + b_{ij}^{s} + b_{i}^{r})

(1)

where

Φ (\cdot)

is the cumulative distribution function of a standard Gaussian distribution (i.e. probit model), and

[g (Y_{ij}) | Y_{ij} > 0; c_{ij}^{s}, c_{i}^{r}]

as Gaussian with mean

Z_{ij}^{⊤} γ + c_{ij}^{s} + c_{i}^{r}

and constant variance

σ^{2}

(i.e. linear mixed effect model on

g (Y_{ij}) | Y_{ij} > 0

). Here,

β

and

γ

are column vectors of regression coefficients. The patient-specific random effects

(B_{i}^{r}, C_{i}^{r})

allow patients to have a consistently high or low probability of having disability and a consistently high or low mean for the non-zero HAQ scores across time. While the patient-specific stochastic processes

{B_{i}^{s} (t_{i 1}), \dots, B_{i}^{s} (t_{{im}_{i}}), C_{i}^{s} (t_{i 1}), \dots, C_{i}^{s} (t_{{im}_{i}})}

can capture serial correlation and non-predictable changes in unobserved heterogeneity.⁹

We assume ${B_{i}^{r}, C_{i}^{r}}$ follows a bivariate normal distribution with mean vector zero and

Var (B_{i}^{r}) = σ_{b}^{2}, Var (C_{i}^{r}) = σ_{c}^{2}, Cov (B_{i}^{r}, C_{i}^{r}) = σ_{b} σ_{c} ρ

(2)

where

σ_{b}^{2}

and

σ_{c}^{2}

are variance parameters and ρ is the correlation between

B_{i}^{r}

and

C_{i}^{r}

. Furthermore, we consider two classes of stochastic processes for

{B_{i}^{s} (t), C_{i}^{s} (t)}

that are subsequently described. For convenience, define

B_{i} (t) = B_{i}^{r} + B_{i}^{s} (t)

and

C_{i} (t) = C_{i}^{r} + C_{i}^{s} (t)

, i.e. the patient-specific random effects

B_{i}^{r}

and

C_{i}^{r}

are absorbed into the stochastic processes

B_{i} (t)

and

C_{i} (t)

, respectively, and let the covariance matrix of

{B_{i} (t_{i 1}), \dots, B_{i} (t_{{im}_{i}}), C_{i} (t_{i 1}), \dots, C_{i} (t_{{im}_{i}})}

Σ_{i} = (\begin{matrix} Σ_{i b} & Σ_{i b c} \\ Σ_{i b c} & Σ_{i c} \end{matrix}) .

3.1 Correlated Gaussian processes

The first and most general model that we consider is defined when ${B_{i}^{s} (t), C_{i}^{s} (t)}$ are correlated stationary Gaussian processes. That is the model proposed by Albert and Shen⁸

\begin{array}{l} Cov {B_{i}^{s} (t_{i j}), B_{i}^{s} (t_{i k})} = σ_{g b}^{2} ρ_{g b}^{| t_{i j} - t_{i k} |}, Cov {C_{i}^{s} (t_{i j}), C_{i}^{s} (t_{i k})} = σ_{g c}^{2} ρ_{g c}^{| t_{i j} - t_{i k} |} \\ Cov {B_{i}^{s} (t_{i j}), C_{i}^{s} (t_{i k})} = σ_{g b} σ_{g c} ρ_{g} ρ_{g b c}^{| t_{i j} - t_{i k} |} \end{array}

(3)

where

σ_{gb}^{2}

and

σ_{gc}^{2}

are variance parameters, ρ_g is the correlation between the Gaussian processes at the same time point, and ρ_gb, ρ_gc, ρ_gbc are the degradation parameters governing the serial correlation within and between processes, respectively. Following Albert and Shen,⁸ the processes

B_{i} (t)

and

C_{i} (t)

are taken to be exchangeable Ornstein-Uhlenbeck (EOU) processes, and the model containing these processes is called the general model, i.e. (1–3). Some special cases of the general model are

Shared EOU process model when $c_{ij} = θ b_{ij}$ ,

Correlated OU processes model when $σ_{b}^{2} \equiv σ_{c}^{2} \equiv ρ \equiv 0$ ,

Shared OU process model when $c_{ij}^{s} = θ b_{ij}^{s}$ and $σ_{b}^{2} \equiv σ_{c}^{2} \equiv ρ \equiv 0$ ,

Correlated random effects model when $σ_{gb}^{2} \equiv σ_{gc}^{2} \equiv ρ_{g} \equiv ρ_{gb} \equiv ρ_{gc} \equiv ρ_{gbc} \equiv 0$ ,

Shared random effect model when $c_{i}^{r} = θ b_{i}^{r}$ and $σ_{gb}^{2} \equiv σ_{gc}^{2} \equiv ρ_{g} \equiv ρ_{gb} \equiv ρ_{gc} \equiv ρ_{gbc} \equiv 0$ ,

where θ is a parameter to be estimated.

3.1.1 Remarks on ρ_gb, ρ_gc and ρ_gbc

Although the general model is very flexible, it will not always be mathematically valid. Let the covariance matrices $Σ_{ib}^{s}, Σ_{ic}^{s}$ and $Σ_{ibc}^{s}$ have (j, k)th entry $σ_{gb}^{2} ρ_{gb}^{| t_{ij} - t_{ik} |}, σ_{gc}^{2} ρ_{gc}^{| t_{ij} - t_{ik} |}$ and $σ_{gb} σ_{gc} ρ_{g} ρ_{gbc}^{| t_{ij} - t_{ik} |}$ respectively, i.e. described by equation (3). If ρ_gb, ρ_gc and ρ_gbc are unconstrained (as specified by Albert and Shen⁸), the matrix $Σ_{i}^{s}$ where

Σ_{i}^{s} = (\begin{matrix} Σ_{i b}^{s} & Σ_{i b c}^{s} \\ Σ_{i b c}^{s} & Σ_{i c}^{s} \end{matrix})

will not in general be a valid covariance matrix since

Σ_{i}^{s}

, although symmetric, is not constrained to be positive semi-definite and therefore

B_{i}^{s} (t)

and

C_{i}^{s} (t)

will not necessarily form a jointly Gaussian process. The primary difficulty results when ρ_g (the correlation between

B_{i}^{s} (t)

and

C_{i}^{s} (t)

at each time t) is close to one because the processes

B_{i}^{s} (t)

and

C_{i}^{s} (t)

are similar and therefore it will not be plausible for them to degrade at vastly different rates (i.e. for ρ_gb, ρ_gc and ρ_gbc to be vastly different). A reasonable approximation in this situation would be to constrain the degradation and cross degradation parameters to be same, specifically

ρ_{gb} \equiv ρ_{gc} \equiv ρ_{gbc} \equiv ρ_{g 1}

. This constraint would then enforce

Σ_{i}^{s}

to be a valid covariance matrix since the Schur component

Σ_{ib}^{s} - Σ_{ibc}^{s} (Σ_{ic}^{s}) -^{1} Σ_{ibc}^{s} \equiv σ_{gb}^{2} (1 - ρ_{g}^{2}) Σ_{i}^{s} (ρ_{g 1})

, where

Σ_{i}^{s} (ρ_{g 1})

has (j, k)th entry

ρ_{g 1}^{| t_{ij} - t_{ik} |}

, is constrained to be positive semi-definite. The resulting correlation structure would then be

\begin{array}{l} Cor {B_{i}^{s} (t_{i j}), B_{i}^{s} (t_{i k})} = Cor {C_{i}^{s} (t_{i j}), C_{i}^{s} (t_{i k})} = ρ_{g 1}^{| t_{i j} - t_{i k} |} \\ Cor {B_{i}^{s} (t_{i j}), C_{i}^{s} (t_{i k})} = ρ_{g} ρ_{g 1}^{| t_{i j} - t_{i k} |} \end{array}

(4)

In the motivating application, ρ_g was estimated close to one. Slight deviations from the correlation structure described by equation (4) (for example $ρ_{gb} \equiv ρ_{gc} \equiv ρ_{g 1}$ and $ρ_{gbc} = ρ_{g 1} δ$ where $δ \in (0, 1)$ ) resulted in non-positive semi-definite matrices for various $Σ_{i}$ , and therefore the model fitting procedure was problematic. Note that a further simplification would be to constrain $ρ_{g} = 1$ (in addition to $ρ_{g 1} = ρ_{gb}$ ), this would result in the shared EOU process model. If, however, ρ_g takes a smaller value, and therefore the two Gaussian processes are less correlated, it would then be more plausible for the Gaussian processes to degrade at different rates. Hence, having unconstrained degradations parameters will likely be less problematic.

For completeness, note that

\begin{array}{l} {Σ_{i b}}_{j k} = σ_{b}^{2} + σ_{g b}^{2} ρ_{g b}^{| t_{i j} - t_{i k} |} \\ {Σ_{i c}}_{j k} = σ_{c}^{2} + σ_{g c}^{2} ρ_{g c}^{| t_{i j} - t_{i k} |} \\ {Σ_{i b c}}_{j k} = σ_{b} σ_{c} ρ + σ_{g b} σ_{g c} ρ_{g} ρ_{g b c}^{| t_{i j} - t_{i k} |} \end{array}

(5)

3.2 Correlated random walks

The second model structure that we consider is defined when ${B_{i}^{s} (t), C_{i}^{s} (t)}$ are correlated continuous-time random walks. That is the model proposed by Ghosh and Albert.⁹ Specifically, define sequentially ${B_{i} (t_{ij + 1}), C_{i} (t_{ij + 1})} | {B_{i} (t_{ij}) = b_{ij}, C_{i} (t_{ij}) = c_{ij}}$ to be bivariate normal with mean $(b_{ij}, c_{ij})$ and covariance matrix

(\begin{matrix} σ_{w b}^{2} (t_{i j + 1} - t_{i j}) & σ_{w b} σ_{w c} ρ_{w} (t_{i j + 1} - t_{i j}) \\ σ_{w b} σ_{w c} ρ_{w} (t_{i j + 1} - t_{i j}) & σ_{w c}^{2} (t_{i j + 1} - t_{i j}) \end{matrix})

In addition $(b_{i 1}, c_{i 1}) = (b_{i}^{r}, c_{i}^{r})$ are initiated at realisations of the patient-specific random effects. Here, $σ_{wb}^{2}, σ_{wc}^{2}$ and ρ_w are variance and correlation parameters that quantify serial correlation (both within and across processes). This model will be denoted by a correlated random walks (CRW) model and it contains as special cases.

Shared random walk model when $c_{ij} = θ b_{ij}$ ,

Correlated random effects model when $σ_{wb}^{2} \equiv σ_{wc}^{2} \equiv ρ_{w} \equiv 0$ ,

Shared random effect model when $c_{i}^{r} = θ b_{i}^{r}$ and $σ_{wb}^{2} \equiv σ_{wc}^{2} \equiv ρ_{w} \equiv 0$ .

Although the CRW model is less flexible than the general model, it has the advantage, from its sequential construction, of always being well defined even when the parameters are unconstrained (apart from the usual constraint that correlation parameters have modulus less than or equal to unity). Moreover

\begin{array}{l} {Σ_{i b}}_{j k} = σ_{b}^{2} + σ_{w b}^{2} {min (t_{i j}, t_{i k}) - t_{i 1}} \\ {Σ_{i c}}_{j k} = σ_{c}^{2} + σ_{w c}^{2} {min (t_{i j}, t_{i k}) - t_{i 1}} \\ {Σ_{i b c}}_{j k} = σ_{b} σ_{c} ρ + σ_{w b} σ_{w c} ρ_{w} {min (t_{i j}, t_{i k}) - t_{i 1}} \end{array}

(6)

4 Efficient maximum likelihood estimation procedure for stochastic processes models

This section describes our efficient maximum likelihood estimation procedure for the flexible models described in Section 3. Firstly, in Section 4.1, we describe a generic likelihood function for all of the described models. The multivariate normal identity that can be used to evaluate certain multi-dimensional integrals in terms of a multivariate normal cumulative distribution function is introduced in Section 4.2. Finally, in Section 4.3, we outline how to apply the multivariate normal identity in Section 4.2 to the generic likelihood function in Section 4.1, thus culminating in a computationally efficient likelihood. For completeness, we also provide computational simplifications for correlated stochastic processes models in the appendix.

4.1 Likelihoods

For ease of exposition, we describe the likelihood contribution from patient i. The likelihood can then be obtained by taking the product of all likelihood contributions from each patient. Firstly, we consider models that contain two (correlated) stochastic processes. For these models, the likelihood contribution from patient i is

\begin{array}{l} L_{i} (Θ) = \int_{b_{i}} \int_{c_{i}} [\prod_{j = 1}^{m_{i}} Φ {(X_{i j}^{⊤} β + b_{i j})}^{u_{i j}} {1 - Φ (X_{i j}^{⊤} β + b_{i j})}^{1 - u_{i j}}] \\ \times [\prod_{j = 1}^{m_{i}} {[\frac{1}{\sqrt{2 π σ^{2}}} \exp {- \frac{{(g (y_{i j}) - Z_{i j}^{⊤} γ - c_{i j})}^{2}}{2 σ^{2}}}]}^{u_{i j}}] ϕ^{(2 m_{i})} (b_{i}, c_{i}; 0, Σ_{i}) d b_{i} d c_{i} \end{array}

(7)

where

Θ

is a vector comprising all of the unknown parameters,

b_{i} = (b_{i 1}, \dots, b_{{im}_{i}})^{⊤}

c_{i} = (c_{i 1}, \dots, c_{{im}_{i}})^{⊤}, φ^{(m)} (.; μ, Σ)

is an m dimensional multivariate normal density with mean vector

μ

and covariance matrix

Σ

, and

Σ_{i}

is defined by either equation (5) or equation (6). Similarly, for models containing a single stochastic process (i.e. shared process models), the likelihood contribution from patient i is

\begin{array}{l} L_{i} (Θ) = \int_{b_{i}} [\prod_{j = 1}^{m_{i}} Φ {(X_{i j}^{⊤} β + b_{i j})}^{u_{i j}} {1 - Φ (X_{i j}^{⊤} β + b_{i j})}^{1 - u_{i j}}] \\ \times [\prod_{j = 1}^{m_{i}} {[\frac{1}{\sqrt{2 π σ^{2}}} \exp {- \frac{{(g (y_{i j}) - Z_{i j}^{⊤} γ - θ b_{i j})}^{2}}{2 σ^{2}}}]}^{u_{i j}}] ϕ^{(m_{i})} (b_{i}; 0, Σ_{i b}) d b_{i} \end{array}

(8)

where

Σ_{ib}

can again be obtained from equation (5) or equation (6). We now define our generic likelihood contribution from patient i which encompasses all of the described models. Throughout we apply the following notation:

0

and

1

are

m_{i} \times 1

vectors with all entries being zero and one respectively,

diag (v)

is a matrix with diagonal elements v and zero otherwise, and

I_{d}

is a d × d identity matrix. We also follow the convention that binary operations with a scalar and vector or matrix argument and unary operations with a vector argument are performed element-wise. In matrix form, we have

\begin{array}{l} L_{i} (Θ) = \int_{l_{i}} Φ^{(m_{i})} (A_{i 1} μ_{i b} + A_{i 2} l_{i}; 0, I_{m_{i}}) {(\frac{1}{\sqrt{2 π σ^{2}}})}^{\sum_{j = 1}^{m_{i}} u_{i j}} \\ \times \exp {- \frac{{(g (y_{i}) - A_{i 3} μ_{i c} - A_{i 4} l_{i})}^{⊤} (g (y_{i}) - A_{i 3} μ_{i c} - A_{i 4} l_{i})}{2 σ^{2}}} ϕ^{{dim (l_{i})}} (l_{i}; 0, Σ_{i l}) d l_{i} \end{array}

(9)

where

y_{i} = (y_{i 1}, \dots, y_{{im}_{i}})^{⊤}, μ_{ib} = X_{i} β, μ_{ic} = Z_{i} γ, X_{i} = (X_{i 1}, \dots, X_{{im}_{i}})^{⊤}, Z_{i} = (Z_{i 1}, \dots, Z_{{im}_{i}})^{⊤}

, and

A_{i 1} = diag (2 u_{i} - 1), A_{i 3} = diag (u_{i})

are

m_{i} \times m_{i}

matrices with

u_{i} = (u_{i 1}, \dots, u_{{im}_{i}})^{⊤}

. Here

Φ^{(d)} (.; 0, Σ)

represents the distribution function of

φ^{(d)} (.; 0, Σ)

and

l_{i}

is a (to be specified) column vector of random effects. Note that equation (9) has resulted from repeated application of the identity

1 - Φ (x) = Φ (- x)

The likelihood contribution from patient i, $L_{i} (Θ)$ , is then obtained by specifying the vector of random effects $l_{i}$ and its covariance matrix $Σ_{il}$ together with the $m_{i} \times dim (l_{i})$ matrices $A_{i 2}$ and $A_{i 4}$ which describe how the random effects act on the binary and continuous components of the model. For (7), $l_{i} = (b_{i}, c_{i}), A_{i 2} = (diag (2 u_{i} - 1), diag (0))$ and $A_{i 4} = (diag (0), diag (u_{i}))$ . While for equation (8), $l_{i} = b_{i}, A_{i 2} = diag (2 u_{i} - 1)$ and $A_{i 4} = diag (θ u_{i})$ . Similarly, for the correlated random effects model, $l_{i} = (b_{i}, c_{i}), A_{i 2} = (2 u_{i} - 1, 0)$ and $A_{i 4} = (0, u_{i})$ , and for the shared random effect model, l_i = b_i, $A_{i 2} = 2 u_{i} - 1$ and $A_{i 4} = θ u_{i}$ .

4.2 Multivariate normal identity

In order to evaluate the likelihood described by equation (9), we derive a multivariate normal identity that makes use of a property of the multivariate normal distribution and the standard marginal cumulative distribution function identity. Firstly, suppose that $ω = (ω_{1}, ω_{2})^{⊤}$ follows a multivariate normal distribution with mean vector $(0, η)^{⊤}$ where $ω_{1}$ and $0$ are $k_{1} \times 1$ vectors and $ω_{2}$ and $η$ are $k_{2} \times 1$ vectors, respectively. Furthermore, suppose that the covariance matrix of $ω$ is the $(k_{1} + k_{2}) \times (k_{1} + k_{2})$ matrix $Σ$ where the first k₁ rows of $Σ$ is the $k_{1} \times (k_{1} + k_{2})$ matrix $(Σ_{22}, Σ_{12}^{⊤})$ and the remaining k₂ rows of $Σ$ is the $k_{2} \times (k_{1} + k_{2})$ matrix $(Σ_{12}, Σ_{11})$ respectively. It is a well-known result that $φ^{(k_{1} + k_{2})} (ω; (0, η)^{⊤}, Σ) = φ^{(k_{1})} (ω_{1}; Σ_{12}^{⊤} Σ_{11}^{- 1} (ω_{2} - η), Σ_{22} - Σ_{12}^{⊤} Σ_{11}^{- 1} Σ_{12}) φ^{(k_{2})} (ω_{2}; η, Σ_{11})$ where the right-hand side is the product of the conditional density of $ω_{1} | ω_{2}$ and the marginal density of $ω_{2}$ . By applying the standard marginal cumulative distribution function identity $F_{ω_{1}} (ω_{1}) = \int_{ω_{2}} F_{ω_{1} | ω_{2}} (ω_{1} | ω_{2}) f_{ω_{2}} (ω_{2}) d ω_{2}$ where the integrand is based on the right-hand side of the above result, we obtain the multivariate normal identity:

Φ^{(k_{1})} (ω_{1}; 0, Σ_{22}) = \int_{ω_{2}} Φ^{(k_{1})} (ω_{1} - Σ_{12}^{⊤} Σ_{11}^{- 1} (ω_{2} - η); 0, Σ_{22} - Σ_{12}^{⊤} Σ_{11}^{- 1} Σ_{12}) φ^{(k_{2})} (ω_{2}; η, Σ_{11}) d ω_{2}

(10)

by noting that the marginal distribution of

ω_{1}

is multivariate normal with mean vector

0

and covariance matrix

Σ_{22}

Returning to the application, the general idea is to rearrange equation (9) to take the form of the right-hand side of equation (10), and then to use equation (10) to compute the integrations over the random effects in terms of an m_i dimensional normal cumulative distribution function. Because there exists efficient implementations of the multivariate normal cumulative distribution function, this approach will allow for the efficient computation of the generic likelihood. We note that Barrett et al.¹⁵ used equation (10) to obtain computationally efficient likelihoods of flexible models that jointly consider longitudinal and time to event outcomes. Equation (10) also arises frequently in results concerning the multivariate skew normal distribution.^16–19

4.3 Re-expressing the likelihoods

This section demonstrates how equation (9) (the likelihood contribution from patient i) can be re-expressed. We firstly consider the integrand terms resulting from the continuous component and random effects. That is

(\frac{1}{\sqrt{2 π σ^{2}}})^{\sum_{j = 1}^{m_{i}} u_{ij}} \exp {- \frac{(g (y_{i}) - A_{i 3} μ_{ic} - A_{i 4} l_{i})^{⊤} (g (y_{i}) - A_{i 3} μ_{ic} - A_{i 4} l_{i})}{2 σ^{2}}} φ^{{dim (l_{i})}} (l_{i}; 0, Σ_{il})

(11)

By completing the square in $l_{i}$ (see the appendix for more details), equation (11) can be rearranged as

L_{i 1} φ^{{dim (l_{i})}} (l_{i}; h_{i}, H_{i}^{- 1})

(12)

where

\begin{array}{l} H_{i} = A_{i 4}^{⊤} A_{i 4} / σ^{2} + {(Σ_{i l})}^{- 1} \\ h_{i} = H_{i}^{- 1} A_{i 4}^{⊤} (g (y_{i}) - A_{i 3} μ_{i c}) / σ^{2} \end{array}

(13)

and

L_{i 1} = (\frac{1}{\sqrt{2 π σ^{2}}})^{\sum_{j = 1}^{m_{i}} u_{ij}} \frac{1}{| Σ_{il} H_{i} |^{1 / 2}} \exp {- \frac{(g (y_{i}) - A_{i 3} μ_{ic})^{⊤} (g (y_{i}) - A_{i 3} μ_{ic})}{2 σ^{2}} + \frac{h_{i}^{⊤} H_{i} h_{i}}{2}}

(14)

is independent of

l_{i}

. Substituting equation (12) into equation (9), we consider the integral (ignoring

L_{i 1}

)

\int_{l_{i}} Φ^{(m_{i})} (A_{i 1} μ_{ib} + A_{i 2} l_{i}; 0, I_{m_{i}}) φ^{{dim (l_{i})}} (l_{i}; h_{i}, H_{i}^{- 1}) d l_{i} .

(15)

We can re-express the argument and covariance matrix of the multivariate normal distribution function in equation (15) as

\begin{array}{l} A_{i 1} μ_{i b} + A_{i 2} l_{i} = A_{i 1} μ_{i b} + A_{i 2} h_{i} - (- H_{i}^{- 1} A_{i 2}^{⊤})^{⊤} H_{i} (l_{i} - h_{i}) \\ I_{m_{i}} = I_{m_{i}} + A_{i 2} H_{i}^{- 1} A_{i 2}^{⊤} - (- H_{i}^{- 1} A_{i 2}^{⊤})^{⊤} H_{i} (- H_{i}^{- 1} A_{i 2}^{⊤}) \end{array}

(16)

Therefore equation (15), after applying the multivariate normal identity (described equation (10)), is equivalent to

Φ^{(m_{i})} (A_{i 1} μ_{ib} + A_{i 2} h_{i}; 0, I_{m_{i}} + A_{i 2} H_{i}^{- 1} A_{i 2}^{⊤})

(17)

Based on the above expressions, the likelihood contribution from patient i can now be re-expressed as

\begin{array}{l} L_{i} (Θ) = Φ^{(m_{i})} (A_{i 1} μ_{i b} + A_{i 2} h_{i}; 0, I_{m_{i}} + A_{i 2} H_{i}^{- 1} A_{i 2}^{⊤}) {(\frac{1}{\sqrt{2 π σ^{2}}})}^{\sum_{j = 1}^{m_{i}} u_{i j}} \\ \times \frac{1}{| Σ_{i l} H_{i} |^{1 / 2}} \exp {- \frac{{(g (y_{i}) - A_{i 3} μ_{i c})}^{⊤} (g (y_{i}) - A_{i 3} μ_{i c})}{2 σ^{2}} + \frac{h_{i}^{⊤} H_{i} h_{i}}{2}} \end{array}

(18)

where

\begin{array}{l} A_{i 1} = diag (2 u_{i} - 1) \\ A_{i 3} = diag (u_{i}) \\ H_{i} = A_{i 4}^{⊤} A_{i 4} / σ^{2} + {(Σ_{i l})}^{- 1} \\ h_{i} = H_{i}^{- 1} A_{i 4}^{⊤} (g (y_{i}) - A_{i 3} μ_{i c}) / σ^{2} \end{array}

(19)

and

Σ_{il} = Σ_{i}

Σ_{ib}

with

A_{i 2}

and

A_{i 4}

defined by the specified model.

From equations (18) and (19), it is now evident that evaluating the integrations involved in $L_{i} (Θ)$ reduces to computing the cumulative distribution function of a multivariate normal distribution. This can be performed efficiently, for example by using the R²⁰ package mnormt.²¹ The model fitting procedure is then completed by maximizing the log-likelihood, for example by using the BFGS²² optimization technique, to obtain parameter estimates and asymptotic standard errors (from the observed Fisher information matrix) of model parameters.

5 Application: Patient-specific inference

Using the estimation procedure described in Section 4, we demonstrate how patient-specific inference on the probability of being disabled and the transformed mean HAQ score conditional on disability can be obtained. Specifically, how a unit change in covariate values impacts these quantities for any specific patient. We consider the covariate effects of the number of clinically damaged joints (time-dependent), the number of actively inflamed joints (time-dependent), sex (coded as 1 for males and 0 for females), arthritis duration in years (time-dependent), and age at onset of arthritis in years (standardise). Following Su et al.,^12,13 no transformation was applied to the non-zero HAQ scores, i.e. g(y) = y.

Initially, models with two stochastic processes were fitted to the HAQ data. This resulted in large estimated correlation parameters between the random effects (i.e. $ρ \approx 1$ ) and stochastic processes for both the correlated Gaussian processes and random walks cases (i.e. ρ_g and $ρ_{w} \approx 1$ ). These results therefore suggested a single stochastic process would be sufficient for describing the data. The shared EOU model was then fitted. However, the analysis provided evidence for model over-parameterisation as ${\hat{σ}}^{2}$ appeared to converge at virtually zero and a positive-definite observed Fisher information matrix could not be attained (even when a considerably smaller tolerance level than the default was specified for the computation of multivariate normal probabilities). We therefore considered the shared random walk and OU process models, and for comparative purposes, the shared random effect model. The models containing stochastic processes were fitted using the likelihood described by equations (18) and (19), while the shared random effect model was fitted using numerical integration (since only a single integration per patient is required). The same parameter estimates for the shared random effect model were obtained when equations (18) and (19) were used in the model fitting procedure.

Table 1 presents the results of the fitted models. Across the models, the covariate effects on the mean conditional on disability are seen to be relatively similar as the confidence intervals generally overlap. In addition, the models are in agreement with regard to the association of each covariate apart from arthritis duration. Arthritis duration is statistically significant in the shared random effect model but is not statistically significant in the models that incorporate stochastic processes. It is interesting to note that there are strong agreements regarding the covariate effect of the number of active joints (similar parameter estimates across models and relatively narrow confidence intervals). The models indicate an additional actively inflamed joint will increase the mean HAQ score conditional on disability by approximately 0.21 for any specific patient. For the binary component, the covariate effects are again seen to be relatively similar due to the overlapping confidence intervals. Their interpretation through the direction of association and statistical significance are also consistent across models. The covariate effects from the shared random effect model does, however, consistently demonstrate attenuation to the null when compared to the other models with stochastic processes.

Table 1.

Table displaying patient-specific effects and corresponding 95% Wald intervals on the probability of being disabled and the mean HAQ score conditional on disability.

	Shared random walk	Shared OU process	Shared random effect
Binary component
Damaged joints	0.031 (0.013, 0.049)	0.04 (0.021, 0.059)	0.012 (0.002, 0.022)
Active joints	0.16 (0.13, 0.18)	0.17 (0.15, 0.2)	0.15 (0.13, 0.16)
Sex	−1.72 (−1.9, −1.54)	−2.17 (−2.92, −1.42)	−1.34 (−1.65, −1.02)
Arthritis duration	0.042 (0.025, 0.058)	0.044 (0.022, 0.065)	0.034 (0.023, 0.044)
Age at arthritis onset^a	0.56 (0.45, 0.66)	0.68 (0.48, 0.88)	0.45 (0.3, 0.6)
Intercept	1.77 (1.62, 1.92)	1.97 (1.07, 2.87)	1.11 (0.81, 1.41)
Continuous component
Damaged joints	0.0065 (0.0032, 0.0097)	0.0078 (0.0046, 0.011)	0.0033 (0.00086, 0.0058)
Active joints	0.021 (0.019, 0.023)	0.021 (0.019, 0.023)	0.02 (0.018, 0.022)
Sex	−0.29 (−0.34, −0.24)	−0.35 (−0.46, −0.23)	−0.29 (−0.37, −0.21)
Arthritis duration	0.0025 (−0.0008, 0.0058)	0.0035 (−0.001, 0.0081)	0.0067 (0.0041, 0.0093)
Age at arthritis onset^a	0.076 (0.048, 0.1)	0.093 (0.046, 0.14)	0.086 (0.048, 0.12)
Intercept	0.62 (0.57, 0.66)	0.59 (0.44, 0.74)	0.63 (0.56, 0.7)
θ	0.2 (0.19, 0.22)	0.19 (0.17, 0.21)	0.27 (0.24, 0.3)
$σ^{2}$	0.074 (0.069, 0.079)	0.066 (0.06, 0.072)	0.12 (0.11, 0.12)
$σ_{b}^{2}$	6.3 (5.64, 7.04)		3.29 (2.64, 4.1)
$σ_{gb}^{2}$		10.11 (8.03, 12.74)
$σ_{wb}^{2}$	0.58 (0.52, 0.65)
ρ_gb		0.95 (0.94, 0.96)
Log-likelihood	−3279.08	−3282.11	−3500.48

Denotes the standardised version of the covariate.

A generalized likelihood ratio test of $σ_{wb}^{2} = 0$ and $ρ_{gb} = 1$ produced p values of < 0.001 therefore suggesting preference towards the shared random walk and OU process models respectively when compared to the shared random effect model. Since the shared random walk and OU process models contain the same number of parameters, information criteria, such as AIC, would indicate (weakly) that the shared random walk model is preferable. It is also worth noting that the heterogeneity parameter in the binary component (i.e. $σ_{b}^{2}$ or $σ_{gb}^{2}$ ) is significantly lower in the shared random effect model. For this model, this parameter governs both the heterogeneity and correlation due to repeated measurements and therefore in light of greater unaccounted heterogeneity (compared to the models with stochastic processes), less correlation is expected.²³ In the continuous component, where $σ^{2}$ also accounts for heterogeneity, a smaller difference between the heterogeneity parameters (i.e. $θ^{2} σ_{b}^{2}$ or $θ^{2} σ_{gb}^{2}$ ) is seen; in the order of the models displayed in the table (from right to left), the heterogeneity parameters are 0.24, 0.36 and 0.25, respectively.

6 Modelling the overall marginal mean

In many cases, it is of interest to obtain population-based inference in addition/as opposed to patient-specific inference. For example, for strategic public health policy purposes, it would be more clinically meaningful to obtain covariate effects on quantities of interest after averaging over all patients. Currently, the proposed models are parametrised to allow easily interpretable patient-specific covariate effects, those with $B_{i} (t_{ij}) = b_{ij}$ and $C_{i} (t_{ij}) = c_{ij}$ , to act on the patient-specific mean of the transformed positive values (i.e. $E [g (Y_{ij}) | Y_{ij} > 0, C_{i} (t_{ij}) = c_{ij}]$ ) and the patient-specific probability of a having a positive value (i.e. $ℙ (U_{ij} = 1 | B_{i} (t_{ij}) = b_{ij})$ ). However, under this parametrisation, it no longer becomes straightforward to obtain easily interpretable population-level covariate effects on the marginal mean of the transformed positive values (the mean of the transformed positive values after averaging over all b_ij and c_ij, i.e. $E [g (Y_{ij}) | Y_{ij} > 0]$ ) since it is a highly non-linear function of the linear predictors in the binary and continuous components.⁵ Thus, the effect of a single covariate is generally interpreted by fixing other covariates at certain values.⁸ This problem remains even when population-level covariate effects on the overall marginal mean of the transformed values (i.e. $E [g (Y_{ij})]$ ) are of primary interest, which has strongly been argued as an important target of inference;²⁴ it is estimated using data from the same patients over time (unlike $E [g (Y_{ij}) | Y_{ij} > 0]$ ) and it is a measure of the undecomposed outcome. We reiterate that in considering the overall marginal mean of the transformed values as a target of inference, we assume that the monotonic transformation function is such that $g (0) = 0$ and $g (Y_{ij}) | Y_{ij} > 0$ is positive and approximately Gaussian with constant variance $σ^{2}$ .

In order to obtain population-based inference on the overall marginal mean of the transformed values, Smith et al.⁴ proposed the following model parameterisation

\begin{array}{l} ℙ (U_{i j} = 1 | B_{i}^{r} = b_{i}^{r}) = g_{1} (X_{i j}^{⊤} β + b_{i}^{r}) \\ E [g (Y_{i j}) | C_{i}^{r} = c_{i}^{r}] = g_{2} (Z_{i j}^{⊤} α + c_{i}^{r}) \end{array}

(20)

where

g_{1} (\cdot)

and

g_{2} (\cdot)

are monotonic link functions and

B_{i}^{r}, C_{i}^{r}

are, as before, zero mean bivariate normal patient-specific random effects. Recall that transformation and link functions differ in that transformation functions are applied prior to modelling. In their specific context, Smith et al.⁴ considered the identity transformation for

g (\cdot)

but allowed the positive values of Y_ij to follow a log-skew-normal distribution. Under this parametrisation, for a suitably chosen link function such as

g_{2}^{- 1}

being the identity or log link, it is implicit that easily interpretable covariate effects on the overall marginal mean of

g (Y_{ij}), α

, can now be obtained. Smith et al.⁴ implemented this model by using a Bayesian estimation approach with

E [g (Y_{ij}) | Y_{ij} > 0, B_{i}^{r} = b_{i}^{r}, C_{i}^{r} = c_{i}^{r}] = \frac{g_{2} (Z_{ij}^{⊤} α + c_{i}^{r})}{g_{1} (X_{ij}^{⊤} β + b_{i}^{r})}

specified in the likelihoods defined by equation (7) or equation (8). Note that

E [g (Y_{ij}) | Y_{ij} > 0, B_{i}^{r} = b_{i}^{r}, C_{i}^{r} = c_{i}^{r}]

is no longer parametrised to be equivalent to a monotonic function of a linear predictor, as was specified before. While this approach for modelling the overall marginal mean is intuitive, it is clear that the multivariate normal identity in Section 4.2 can no longer be used to compute the integrations over the multi-dimensional random effects in the marginal likelihood. Thus, as mentioned in the introduction, implementation of such models can be computationally challenging, especially for our situation where it would be of interest to consider b_ij and c_ij (i.e. realisations of stochastic processes) instead of

b_{i}^{r}

and

c_{i}^{r}

(i.e. realisations of patient-specific random effects) in equation (20).

We now propose another method which would allow easily interpretable covariate effects to act on the overall marginal mean of $g (Y_{ij})$ . In contrast, this method facilitates the inclusion of stochastic processes because it retains the proposed efficient implementation procedure described in Section 4. To the best of our knowledge, there are no other methods in the literature that facilitates the practical implementation of stochastic processes models for directly modelling the overall marginal mean.

We first begin by computing the overall marginal mean of $g (Y_{ij})$ when

\begin{array}{l} ℙ (U_{i j} = 1 | B_{i} (t_{i j}) = b_{i j}) = g_{1} (X_{i j}^{⊤} β + b_{i j}) \\ E [g (Y_{i j}) | Y_{i j} > 0, C_{i} (t_{i j}) = c_{i j}] = Δ_{i j} + c_{i j} \end{array}

(21)

where Δ _ij is a function of covariates (at the jth visit from patient i) and regression coefficients only. That is c_ij is now assumed to act additively on the mean of the transformed positive values and not on the overall mean of the transformed values as is the case in equation (20). For models with two processes, the overall marginal mean of

g (Y_{ij})

is defined by

E [g (Y_{ij})] = \int_{b} \int_{c} Φ (X_{ij}^{⊤} β + b_{ij}) (Δ_{ij} + c_{ij}) φ^{(2)} (b_{ij}, c_{ij}; 0, Σ_{ij}) d b_{ij} d c_{ij}

where

Σ_{i j} = (\begin{matrix} σ_{b i j}^{2} & σ_{b i j} σ_{c i j} ρ_{b c i j} \\ σ_{b i j} σ_{c i j} ρ_{b c i j} & σ_{c i j}^{2} \end{matrix})

and

σ_{bij}^{2}, σ_{cij}^{2}

and ρ_bcij are the variances and correlation of

B_{i} (t_{ij})

and

C_{i} (t_{ij})

, respectively. Similarly, the overall marginal mean for a shared process model is given by

E [g (Y_{ij})] = \int_{b} Φ (X_{ij}^{⊤} β + b_{ij}) (Δ_{ij} + θ b_{ij}) φ^{(1)} (b_{ij}; 0, σ_{bij}^{2}) d b_{ij} .

Conveniently, these integrals can be computed analytically and this results in

Φ (\frac{X_{ij}^{⊤} β}{\sqrt{1 + σ_{bij}^{2}}}) Δ_{ij} + \frac{σ_{bij} σ_{cij} ρ_{bcij}}{\sqrt{1 + σ_{bij}^{2}}} φ (\frac{X_{ij}^{⊤} β}{\sqrt{1 + σ_{bij}^{2}}})

and

Φ (\frac{X_{ij}^{⊤} β}{\sqrt{1 + σ_{bij}^{2}}}) Δ_{ij} + \frac{θ σ_{bij}^{2}}{\sqrt{1 + σ_{bij}^{2}}} φ (\frac{X_{ij}^{⊤} β}{\sqrt{1 + σ_{bij}^{2}}})

respectively. The derivation of the first overall marginal mean of

g (Y_{ij})

(resulting from models with two processes) can be found in the supplementary material of Tom et al.,⁵ and the second overall marginal mean of

g (Y_{ij})

is derived in the appendix. If we specify

E [g (Y_{ij})] = Z_{ij}^{⊤} α

, we can then reparametrise

Δ_{ij} = [Z_{ij}^{⊤} α - \frac{σ_{bij} σ_{cij} ρ_{bcij}}{\sqrt{1 + σ_{bij}^{2}}} φ (\frac{X_{ij}^{⊤} β}{\sqrt{1 + σ_{bij}^{2}}})] / Φ (\frac{X_{ij}^{⊤} β}{\sqrt{1 + σ_{bij}^{2}}})

(22)

and

Δ_{ij} = [Z_{ij}^{⊤} α - \frac{θ σ_{bij}^{2}}{\sqrt{1 + σ_{bij}^{2}}} φ (\frac{X_{ij}^{⊤} β}{\sqrt{1 + σ_{bij}^{2}}})] / Φ (\frac{X_{ij}^{⊤} β}{\sqrt{1 + σ_{bij}^{2}}})

in the respective models. Thus, as in equation (20),

α

offers easily interpretable covariate effects of

Z_{ij}

on the overall marginal mean of

g (Y_{ij})

(by definition). In particular, a unit change in components of

Z_{ij}

will increase the overall marginal mean of

g (Y_{ij})

by the respective components in

α

. However, from equations (21) and (22), it is also evident that replacing

Z_{ij}^{⊤} γ

with Δ _ij in Section 4 will still allow the proposed efficient estimation procedure to be applied. It is also possible to reparametrise patient-specific covariate effects

β

in the binary component in terms of population-level covariate effects

ξ

, specifically

β = ξ \sqrt{1 + σ_{bij}^{2}}

, since it can be shown that

ℙ (U_{ij} = 1) = Φ (X_{ij}^{⊤} β / \sqrt{1 + σ_{bij}^{2}})

. This relationship is easily proved. In the motivating application, this reparametrisation led to a numerically unstable optimization routine, therefore

(β, α)

was estimated with

\hat{ξ}

obtained as

\hat{β} / \sqrt{1 + {\hat{σ}}_{bij}^{2}}

and standard errors were calculated using the delta method.

6.1 Population-based inference

Using the parameterisations described in the previous subsection, we demonstrate how population-based inference on the probability of being disabled and the overall marginal mean HAQ score can be obtained. Specifically, on averaging across patients, how a unit change in covariate values impacts these quantities. For illustrative purposes, the same covariates as those considered in the patient-specific case are considered. Note that for generalized linear models, conditional and marginal covariate effects will generally differ unless certain random effects distributions and link functions are chosen.²⁵

As mentioned, marginal covariate effects on the probability of being disabled, $ξ$ , were obtained from $β / \sqrt{1 + σ_{bij}^{2}}$ with $β$ and $σ_{bij}^{2}$ (the variance of $B_{i} (t_{ij})$ ) estimated using the model fitting procedure. The shared OU process and random effect models, where $σ_{bij}^{2}$ does not depend on j (or i) for these models, were considered. For models with random walks, $σ_{bij}^{2}$ varies with j and therefore $ξ$ will have a time-dependent interpretation. For simplicity, these models are not considered. The shared random effect model was fitted using the parameterisation described by equation (20), with $g_{1} (y) = Φ (y)$ and $g_{2} (y) = y$ , and using the parameterisation described by equations (21) and (22), thus the same link functions are used and the inferences (at the population-level) from these models are comparable. These models will be denoted as shared random effect model-overall and -conditional respectively. Note that unlike at the population-level, the patient-specific assumptions from the shared random effect model-overall and -conditional are vastly different. The shared random effect-overall model assumes that the overall patient-specific mean, i.e. $E [Y_{ij} | B_{i}^{r} = b_{i}^{r}, C_{i}^{r} = c_{i}^{r}]$ , has a linear form, namely $Z_{ij}^{⊤} α + c_{i}^{r}$ . While the shared random effect-conditional model assumes that this quantity takes a particular non-linear form, namely $(Δ_{ij} + c_{i}^{r}) Φ (X_{ij}^{⊤} β + b_{i}^{r})$ . As before, the shared random effect models (both -overall and -conditional) were fitted using numerical integration and maximum likelihood estimation under the assumption that the positive values follow a normal distribution with constant variance.

Table 2 presents the results. Population-level covariate effects on the overall marginal mean are seen to be relatively similar across models due to the considerable overlap in confidence intervals. All three models are in strong agreement regarding the population-level covariate effect of the number of active joints. That is, on average, patients with an additional actively inflamed joint have an overall mean HAQ score increased by approximately 0.02. In contrast to the patient-specific case, the population-level covariate effects on the probability of being disabled are now more consistent across models. A generalized likelihood ratio test of

ρ_{gb} = 1

produced a p value of < 0.001 and therefore the shared OU process model is to be preferred over the shared random effect-conditional model. Log-likelihood values also indicate slight preference to the shared random effect-conditional model (−3507.63) over the shared random effect-overall model (−3582.78).

Table 2.

Table displaying population-level effects and corresponding 95% Wald intervals on the probability of being disabled and the overall marginal mean HAQ score.

	Shared OU process	Shared random effect-conditional	Shared random effect-overall
Binary component
Damaged joints	0.012 (0.006, 0.018)	0.0057 (0.0006, 0.011)	0.0066 (0.002, 0.011)
Active joints	0.05 (0.043, 0.058)	0.063 (0.054, 0.071)	0.051 (0.044, 0.058)
Sex	−0.65 (−0.88, −0.43)	−0.64 (−0.8, −0.48)	−0.61 (−0.76, −0.46)
Arthritis duration	0.013 (0.0057, 0.02)	0.014 (0.0095, 0.02)	0.012 (0.0078, 0.017)
Age at arthritis onset^a	0.19 (0.14, 0.24)	0.2 (0.13, 0.27)	0.19 (0.12, 0.26)
Intercept	0.59 (0.35, 0.83)	0.56 (0.42, 0.7)	0.68 (0.55, 0.81)
Overall marginal mean
Damaged Joints	0.0064 (0.0036, 0.0094)	0.0029 (0.00054, 0.0053)	0.004 (0.0016, 0.0065)
Active joints	0.02 (0.018, 0.022)	0.021 (0.019, 0.023)	0.022 (0.02, 0.024)
Sex	−0.3 (−0.4, −0.19)	−0.28 (−0.36, −0.21)	−0.31 (−0.39, −0.23)
Arthritis duration	0.0035 (0.00032, 0.0068)	0.006 (0.0038, 0.0082)	0.0054 (0.0031, 0.0077)
Age at arthritis onset^a	0.073 (0.049, 0.098)	0.08 (0.048, 0.11)	0.09 (0.052, 0.13)
Intercept	0.62 (0.5, 0.73)	0.61 (0.54, 0.68)	0.59 (0.52, 0.66)
θ	0.18 (0.16, 0.2)	0.27 (0.24, 0.29)	0.22 (0.2, 0.24)
$σ^{2}$	0.064 (0.057, 0.071)	0.12 (0.11, 0.12)	0.12 (0.11, 0.13)
$σ_{b}^{2}$		3.63 (2.92, 4.5)	5.19 (4.26, 6.34)
$σ_{gb}^{2}$	11.12 (8.71, 14.19)
ρ_gb	0.95 (0.94, 0.96)
Log−likelihood	−3277.2	−3507.63	−3582.78

Denotes the standardised version of the covariate.

7 Discussion

This paper reconsiders the flexible two-part models of Albert and Shen⁸ and Ghosh and Albert⁹ and proposes an efficient method of implementation. Specifically, the problem of integrating over high dimensional random effects is replaced by evaluating the cumulative distribution function of a multivariate normal distribution. This leads to efficient algorithms being employed and results in only an optimization procedure being required for model fitting. Furthermore, while retaining the flexibility of including stochastic processes and the practicality of an efficient model fitting procedure, this paper also provides model parameterisations which allow easily interpretable covariate effects to act on the overall marginal mean. The proposed methodology was applied to a psoriatic arthritis data set with extensive follow-up information.

Through their application and a simulation study, Albert and Shen⁸ demonstrated that overall conditional means (conditional on realisations of stochastic processes) may suffer from bias if serial correlation is present but a shared random effect model is used instead. Furthermore, as the shared random effect model becomes more misspecified (ρ_gb decreases from one), the degree of bias increases. However, under the same set-up, overall marginal means were less susceptible to bias. In the motivating application, the estimated degradation parameters from the shared OU process models were ${\hat{ρ}}_{gb} = 0.95$ in both applications (Sections 5 and 6.1). The reasonably high estimated correlation may therefore explain why the shared random effect model $(ρ_{gb} = 1)$ was a reasonable approximation in terms of estimating regression coefficients, although it was substantially the worst fitting model.

Preliminary analyses suggested shared process models were reasonable for our data since $\hat{ρ}, {\hat{ρ}}_{w}$ and ${\hat{ρ}}_{g} \approx 1$ when the described bivariate processes models were fitted. Although this may not be surprising as both parts of the model are describing the same response process, it is worth noting that the estimated correlation parameter (between processes) can in principle take a value between $(- 1, 1)$ as evidenced in other works.^7,9 Our preliminary analyses also demonstrated the need for careful evaluation of models fitted as problems with over fitting may arise. This was evident when the estimated random variation parameter was estimated to be virtually zero (i.e. ${\hat{σ}}^{2} \approx 0$ ) and the observed Fisher information matrix was non positive-definite, even when a considerably smaller tolerance level than the default was specified for computing multivariate normal probabilities.

As mentioned in Section 6, the proposed model parameterisations were motivated by making inference on the overall marginal mean. In this regard, covariate effects (both patient-specific and population-level) on the mean of the positive values and its correlation structure were assumed not of interest. If the mean of the positive values is of primary interest, it would be more sensible to directly use equations (18) and (19), as in Section 5, to obtain patient-specific effects or derive a similar parameterisation, as in Section 6, to obtain population-level effects.

A limitation of the current framework is that it is based on the assumption $g (Y_{ij}) | Y_{ij} > 0$ is approximately Gaussian with constant variance $σ^{2}$ . Specifically, in situations where $g (\cdot)$ is required to be complex so that this assumption will at least approximately hold, the resulting inferential targets will no longer be intuitively interpretable owing to the complexity of the transformation function. One approach that may weaken the need to assume normality of $g (Y_{ij}) | Y_{ij} > 0$ , particularly when the outcome exhibits a large amount of right skewness (e.g. medical expenditures), would be to make the alternative assumption $g (Y_{ij}) | Y_{ij} > 0$ follows a log-normal distribution. This may allow less complex and hence more interpretable transformation functions to be applied to the outcome without having to strongly violate the assumption on $g (Y_{ij}) | Y_{ij} > 0$ . Under this alternative assumption, we provide details in the supplementary materials of how easily interpretable inference on the overall marginal mean and on the mean of the positive transformed outcomes can be obtained with computationally efficient likelihoods. Similar techniques to those in the supplementary materials can also be used when the assumption $g (Y_{ij}) | Y_{ij} > 0$ follows a log-skew-normal distribution is of interest. Although this comes at the cost of having an increased number of integrations in the marginal likelihood.

Finally, the model described by equations (18) and (19) with possible simplifications described in Appendix 2 is very general. Although it was derived in the context of longitudinal semicontinuous data, it contains the model described by Barrett et al.¹⁵ for the longitudinal and survival outcomes setting and implicitly provides a model for clustered cross-sectional semicontinuous data, where the index (i, j) specifies the jth outcome from the ith cluster. The multivariate normal identity described in Section 4.2 can also facilitate the fitting of flexible models describing clustered binary data and continuous bounded outcome data.¹⁴ However, it should be noted that care is required when specifying an appropriate/suitable correlation structure. Particularly, the covariance matrix must be constrained to be symmetric and positive semi-definite otherwise the model fitting procedure will likely be problematic, as was found here. For these alternative situations, the proposed methodology does nevertheless offer a strong basis, especially with regard to implementation, for the developing of flexible models.

Supplemental Material

Supplemental material for Two-part models with stochastic processes for modelling longitudinal semicontinuous data: Computationally efficient inference and modelling the overall marginal mean

Supplemental Material for Two-part models with stochastic processes for modelling longitudinal semicontinuous data: Computationally efficient inference and modelling the overall marginal mean by Sean Yiu and Brian DM Tom in Statistical Methods in Medical Research

Footnotes

Acknowledgments

We are grateful to Professor Vernon T. Farewell for providing general discussions on this research. We also acknowledge the patients in the Toronto Psoriatic Arthritis Clinic.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was financially supported by the UK Medical Research Council [Unit program numbers U105261167 and MC_UP_1302/3].

Appendix 1. Rearranging equation ( 11 )

Appendix 2. Simplification for correlated stochastic processes model

Appendix 3. Overall marginal mean of shared process model

References

Olsen

Schaefer

. A two-part random effects model for semicontinuous longitudinal data. J Am Stat Assoc 2001; 96: 730–745.

Xie

McHugo

Sengupta

et al.

A method for analyzing longitudinal outcomes with many zeros. Ment Health Serv Res 2004; 6: 239–246.

Smith

Preisser

Neelon

et al.

A marginalized two-part model for semicontinuous data. Stat Med 2014; 33: 4891–4930.

Smith

Neelon

Preisser

et al.

A marginalized two-part model for longitudinal semicontinuous data. Stat Meth Med Res 2015; July: 1–24.

Tom

BDM

Farewell

. A corrected formulation for marginal inference derived from two-part mixed models for longitudinal semi-continuous data. Stat Methods Med Res 2016; 25: 2014–2020.

Hall

Zhang

. Marginal models for zero-inflated clustered data. Stat Model 2004; 4: 161–180.

Tooze

Grunwald

Jones

. Analysis of repeated measures data with clumping at zero. Stat Meth Med Res 2002; 11: 341–355.

Albert

Shen

. Modelling longitudinal semicontinuous emesis volume data with serial correlation in an acupuncture clinical trial. J R Stat Soc: Ser C 2005; 54: 707–720.

Ghosh

Albert

. A Bayesian analysis for longitudinal semicontinuous data with an application to an acupuncture clinical trial. Comput Stat Data Anal 2009; 53: 699–706.

10.

Bruce

Fries

. The Stanford health assessment questionnaire: dimensions and practical applications. Health Qual Life Outcomes 2003; 1: 1–20.

11.

Husted

Tom

Farewell

et al.

A longitudinal study of the effect of disease activity and clinical damage on physical function over the course of psoriatic arthritis: does the effect change over time?

Arthritis Rheum 2007; 56: 840–849.

12.

Tom

Farewell

. Bias in two-part mixed models for longitudinal data. Biostatistics 2009; 10: 374–389.

13.

Tom

Farewell

. A likelihood-based two-part marginal model for longitudinal semicontinuous data. Stat Meth Med Res 2015; 24: 194–205.

14.

Hutmacher

French

Krishnaswami

et al.

Estimating transformations for repeated measures modelling of continuous bounded outcome data. Stat Med 2010; 30: 935–949.

15.

Barrett

Diggle

Henderson

Taylor-Robinson

. Joint modelling of repeated measurements and time-to-event outcomes: flexible model specification and exact likelihood inference. J R Stat Soc: Ser B 2015; 77: 131–148.

16.

Azzalini

. A class of distributions which includes the normal ones. Scand J Stat 1985; 12: 171–178.

17.

Azzalini

. The skew-normal distribution and related multivariate families (with discussion). Scand J Stat 1985; 32: 159–200.

18.

Arnold

Beaver

. Hidden truncation models. Sankhya A 2000; 62: 22–35.

19.

Arnold

. Flexible univariate and multivariate models based on hidden truncation. J Stat Plan 2009; 139: 3741–3749.

20.

R Development Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing: Vienna, Austria, http://www.R-project.org, ISBN 3-900051-07-0 (accessed 6 May 2017)..

21.

Azzalini

. mnormt: the multivariate normal and t distributions. R package version 1.5-2 2015. http://azzalini.stat.unipd.it/SW/Pkg-mnormt (2015).

22.

Broyden

. The convergence of a class of double-rank minimisation algorithms. J Inst Math Appl 1970; 6: 76–90.

23.

Henderson

Shimakura

. A serially correlated gamma frailty model for longitudinal count data. Biometrika 2003; 90: 355–366.

24.

Albert

. Letter to the editor. Biometrics 2005; 47: 879–881.

25.

Diggle

Heagerty

Liang

et al.

Analysis of longitudinal data, New York: Oxford University Press, 2002.

26.

Owen

. A table of normal integrals. Commun Stat B Simul 1980; 9: 389–419.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.21 MB

Two-part models with stochastic processes for modelling longitudinal semicontinuous data: Computationally efficient inference and modelling the overall marginal mean

Abstract

Keywords

1 Introduction

2 Functional disability in psoriatic arthritis

3 Model

3.1 Correlated Gaussian processes

3.1.1 Remarks on ρgb, ρgc and ρgbc

3.2 Correlated random walks

4 Efficient maximum likelihood estimation procedure for stochastic processes models

4.1 Likelihoods

4.2 Multivariate normal identity

4.3 Re-expressing the likelihoods

5 Application: Patient-specific inference

6 Modelling the overall marginal mean

6.1 Population-based inference

7 Discussion

Supplemental Material

Supplemental material for Two-part models with stochastic processes for modelling longitudinal semicontinuous data: Computationally efficient inference and modelling the overall marginal mean

Footnotes

Acknowledgments

Declaration of conflicting interests

Funding

Appendix 1. Rearranging equation ( 11 )

Appendix 2. Simplification for correlated stochastic processes model

Appendix 3. Overall marginal mean of shared process model

References

Supplementary Material

3.1.1 Remarks on ρ_gb, ρ_gc and ρ_gbc