A combined overdispersed longitudinal model for nominal data

Abstract

Longitudinal studies involving nominal outcomes are carried out in various scientific areas. These outcomes are frequently modelled using a generalized linear mixed modelling (GLMM) framework. This widely used approach allows for the modelling of the hierarchy in the data to accommodate different degrees of overdispersion. In this article, a combined model (CM) that takes into account overdispersion and clustering through two separate sets of random effects is formulated. Maximum likelihood estimation with analytic-numerical integration is used to estimate the model parameters. To examine the relative performance of the CM and the GLMM, simulation studies were carried out, exploring scenarios with different sample sizes, types of random effects, and overdispersion. Both models were applied to a real dataset obtained from an experiment in agriculture. We also provide an implementation of these models through SAS code.

Keywords

Multinomial distribution beta distribution hierarchical data

1 Introduction

Nominal data may arise from studies in many different subject areas, such as medicine, marketing, education, and agriculture. A nominal outcome has its measurement scale formed by a set of categories that have no intrinsic order, being classified as binary, if only two categories are observed (e.g., dead or alive), or polytomous, if three or more categories are observed (e.g., political party affiliation: democrat, republican, or independent). Although polytomous responses are qualitative, all nominal outcomes may be written as a set of binary variables (Agresti, 2010; Hartzel et al., 2001; Clayton, 1992). For cross-sectional data, a whole collection of modelling approaches can be used, such as the generalized linear modelling (GLM) framework based on the exponential family of distributions (Nelder and Wedderburn, 1972).

One of the key features of the GLM framework is the mean-variance relationship, where the variance is a deterministic function of the mean. For example, for Bernoulli outcomes with success probability $μ = π$ , the variance is $ν (μ) = π (1 - π)$ , which may be overly restrictive depending on the data generating process. Scenarios where the variance is larger or smaller than the mean are reported in the literature as over- or under-dispersion, respectively (Grunwald et al., 2011; Demétrio et al., 2014). For purely binary data, however, hierarchies need to be present for the mean-variance relationship to be violated (Molenberghs et al., 2010, 2012, 2017). Therefore, data from studies where several measurements are taken from the same cluster, subject, or sample unit over time (i.e., longitudinal studies) could violate this assumption.

Some of the main approaches used to analyse longitudinal data with nominal outcomes are generalized estimating equations (GEE) (Liang and Zeger, 1986; Lipsitz et al., 1994; Touloumis et al., 2013), transition models (TM) (Diggle et al., 2002; Molenberghs and Verbeke, 2005; Lara et al., 2017) and generalized linear mixed models (GLMM) (Hartzel et al., 2001; Diggle et al., 2002; Hedeker, 2003; Molenberghs and Verbeke, 2005). These widely used approaches allow for accommodating the correlation between observations induced by the hierarchy of the data collection process, as well as extra-variability. Molenberghs et al. (2007), Molenberghs et al. (2010), Molenberghs et al. (2012), Ivanova et al. (2014), and Molenberghs et al. (2017) showed that accommodating either one of overdispersion or hierarchically induced association may fall short of properly modelling the data. Therefore, they proposed a combined modelling framework encompassing both. Molenberghs et al. (2007) focussed on counts, Molenberghs et al. (2010) laid out a general framework, Molenberghs et al. (2012) worked with binary and binomial outcomes, Ivanova et al. (2014) tackled ordinal outcomes, whereas Molenberghs et al. (2017) contributed with a review of all proposed combined models. Here, we propose a combined model (CM) for nominal outcomes to incorporate the hierarchical data collection process, as well as extra-variability by using two different sets of random effects. Note that Lee and Nelder (1996, 2001, 2003) proposed hierarchical generalized linear models, where random effects can be non-normal, and conjugate, as well. Here, we combine these with normal random effects in the linear predictor.

The remainder of this article is organized as follows. In Section 2, a motivating case study, stemming from an agricultural experiment on elephant grass pasture and dairy cows is introduced. Basic elements for our modelling framework, standard generalized linear models, extensions for overdispersion, the generalized linear mixed model, and the combined modelling framework are the subject of Section 3. The proposed combined model (CM) is described in Section 4, while parameter estimation is the focus of Section 5. A simulation study comparing CM and GLMM is described and results presented in Section 6, while the case study is analysed in Section 7. We offer concluding remarks in Section 8. Finally, we provide the algebraic development in Appendix A and how to implement these models in SAS as Supplementary Materials.

2 Grazing management dataset

This dataset was collected from an experiment on elephant grass pastures (Pennisetum purpureum Schum. cv. Napier) grazed by dairy cows (Pereira et al., 2015a, b). It was set up in a randomized complete block design with the treatments allocated according to a $2 \times 2$ factorial arrangement, where treatments are the combinations of two pre-grazing conditions $(95 %$ and maximum canopy light interception during regrowth) and two post-grazing heights (35 and $45 c m$ ). The experiment was carried out from January 2011 until April 2012, encompassing six seasons: 'Summer 1' (Jan-Mar 2011), 'Autumn' (Apr-June 2011), 'Winter' (July-Sept 2011), 'early Spring' (Oct-mid-Nov 2011), 'Late spring' (mid-Nov-Dec 2011) and 'Summer 2' (Jan-Apr 2012).

The response variable is the type of vegetation observed in the field, with three categories: 'weeds', 'bare ground', or 'tussocks'. Forty points were observed in each one of the four paddocks in each block. The data consists of the total number of points where each category was observed, characterising a multinomial outcome with three levels. There are $40 \times 16 = 640$ points per season, but in the early spring, one of the paddocks was affected by frost damage and thus the total number of observations was $640 \times 6 - 40 = 3800$ . A sample of the dataset and a sketch of the experiment are show in Table 1 and Figure 1, respectively.

Table 1

First and last four of the 3,800 rows in the grazing management dataset.

Seasons	Blocks	Pre-grazing	Post-grazing	Point within paddock	Outcome*
Summer 1	1	maximum	35	1	3
Autumn	1	maximum	35	1	1
Winter	1	maximum	35	1	2
Early Spring	1	maximum	35	1	3
⋮	⋮	⋮	⋮	⋮	⋮
Winter	4	95	45	640	1
Early Spring	4	95	45	640	3
Late Spring	4	95	45	640	3
Summer 2	4	95	45	640	3

* where weed = 1, bare ground = 2 and tussock = 3

Figure 1

Sketch of the design of the grazing management experiment. Each of the four blocks consists of four paddocks, each one with a combination of the levels of two treatment factors. Forty points were observed within each paddock.

3 Building blocks

Here, we briefly present the main concepts to formulate the combined model for nominal outcomes. In Section 3.1, we introduce the exponential family, generalized linear models and overdispersion. In Section 3.2, we present some properties of generalized linear mixed models and the general framework of combined models.

3.1 Generalized linear models and overdispersion

The class of generalized linear models (GLM) was introduced by Nelder and Wedderburn (1972) as a framework for handling a range of common statistical models for Gaussian and non-Gaussian data. A GLM is defined in terms of three components.

The first component is a set of independent random variables, $Y_{1}, \dots, Y_{N}$ , with probability or density function that belongs to the exponential family:

f (y_{i} ∣ η_{i}, ϕ) = exp \{ϕ^{- 1} [y_{i} η_{i} - ψ (η_{i})] + c (y_{i}, ϕ)\},

(3.1)

where $ψ (\cdot)$ and $c (\cdot)$ are known functions and $ϕ$ and $η_{i}$ are called dispersion and natural or canonical parameter, respectively. The exponential family includes several distributions, such as the Gaussian, Bernoulli, binomial, Poisson, gamma and multinomial distributions. The first two moments of a distribution that belongs to the exponential family are given by

E (Y_{i}) = μ_{i} = ψ^{'} (η_{i}) and Var (Y_{i}) = σ^{2} = ϕ ψ^{″} (η_{i}) .

Thus, the mean and variance of these distributions are related through

σ^{2} = ϕ ψ^{″} [ψ^{'} - 1 (μ)] = ϕ v (μ),

with $v (\cdot)$ termed the variance function.

The second component, called linear predictor or natural parameter, $η_{i}$ , is the quantity that incorporates the information about the independent variables into the model. The third component, called the link function, $h (\cdot)$ , provides the relationship between the linear predictor and the mean of the distribution as $μ_{i} = h (η_{i}) = h (x_{i}^{T} β)$ , where $x_{i}$ and $β$ are covariate vectors and fixed unknown regression coefficients, respectively.

The multinomial distribution is a natural starting point for analysis of polytomous outcomes. This distribution arises as a natural extension of the binomial distribution when each independent trial has more than two possible mutually exclusive outcomes. Consider a series of $m$ independent trials of an experiment, each resulting in one of $R$ mutually exclusive events $E_{1}, \dots, E_{R}$ . In each replicate within the experiment, the probability of the occurrence of event $E_{r}$ is equal to $π_{r}$ with $\sum_{r = 1}^{R} π_{r} = 1$ . Let $Y^{*} = {(Y_{1}, \dots, Y_{R})}^{T}$ denote the random vector of the number of occurrences of events $E_{1}, \dots, E_{R}$ out of $m$ trials, with $\sum_{r = 1}^{R} Y_{r} = m$ . Let $y^{*} = {(y_{1}, \dots, y_{R})}^{T}$ represent a realization of $Y^{*}, \sum_{r = 1}^{R} y_{r} = m$ . Then, the random vector $Y^{*}$ is said to have a multinomial distribution with parameters $m, π^{*} = {(π_{1}, \dots, π_{R})}^{T}$ , and joint probability mass function given by

P (Y^{*} = y^{*}) = \frac{m!}{y_{1}! y_{2}! \dots y_{R}!} π_{1}^{y_{1}} π_{2}^{y_{2}} \dots π_{R}^{y_{R}} π_{i} \in [0, 1], i = 1, \dots, R .

(3.2)

We can safely reduce the dimensionality of $Y^{*}$ and $π^{*}$ by removing their respective last elements, since they can be obtained from the remaining $R - 1$ categories. Then, define $Y = {(Y_{1}, \dots, Y_{R - 1})}^{T}$ , $π = {(π_{1}, \dots, π_{R - 1})}^{T}$ and the realization of $Y$ as $y = (y_{1}, \dots, y_{R - 1})$ . Without loss of generality, $Y$ follows a multinomial distribution with parameters $m$ and $π$ with joint probability function as in (3.2) with $y_{R} = m - \sum_{h = 1}^{R - 1} y_{h}$ and $π_{R} = 1 - \sum_{h = 1}^{R - 1} π_{h}$ . Hence, the mean and the variance of $Y$ are, respectively,

E (Y) = m π and  Var (Y) = m Δ (π),

(3.3)

where $Δ (π) = diag (π) - π π^{T}$ . Note that $Δ (π)$ is an $R \times R$ variance-covariance matrix of full rank where the diagonal elements are $π_{r} (1 - π_{r})$ and off-diagonal elements $- π_{r} π_{r^{'}}$ for $r \neq r^{'}$ .

It is well known that (3.3) implies a restrictive variance function. According to Grunwald et al. (2011) and Demétrio et al. (2014), cases where the variance is greater than the mean are largely reported in the literature as overdispersion, which may occur due to the absence of relevant covariates, heterogeneity of sampling units, correlation induced by hierarchical structures and/or excess of zeros. It should be noted that underdispersion can occur as well (for instance, see Ribeiro Jr et al. (2020) and Morris and Sellers (2022) for extended approaches recently developed that can be used to accommodate underdispersion). Thus, it is important to adapt models to take into account deviations from assumed mean-variance relationships in order to avoid incorrect inferences (Hinde and Demétrio, 1998).

One possibility to extend the multinomial model to handle overdispersion is to multiply the multinomial covariance matrix by a constant scalar parameter. A quasi-likelihood approach using a scale adjustment was presented by McCullagh and Nelder (1983), and later extended by Morel and Koehler (1995) to allow for different levels of overdispersion for each category using a diagonal matrix of overdispersion parameters and a Cholesky decomposition of the multinomial variancecovariance matrix. A mixture of distributions can also be used to allow for overdispersion, such as the random-clumped multinomial distribution proposed by Morel and Nagaraj (1993) and Neerchal and Morel (1998). This model is a finite mixture of multinomial distributions that captures the extra-variability caused by clumped sampling. Another convenient route to take overdispersion into account is through a two-stage approach, which considers a probability distribution for a model parameter, yielding a mixture. For instance, a multinomial model where the parameter $π$ follows a Dirichlet distribution is called the Dirichlet-multinomial model (Mosimann, 1962). Although the estimation of a dispersion parameter might provide some flexibility to standard GLMs, this is not always sufficient, especially when hierarchical structures or highly variable data arise.

3.2 Generalized linear mixed models and combined models

When analysing non-Gaussian data that are hierarchically organized (repeated measures or clustering, for example), the generalized linear mixed model (GLMM) is a popular choice (Molen-berghs and Verbeke, 2005; Diggle et al., 2002). In full generality, one assumes that, conditionally on $q$ -dimensional random effects $b_{i}$ , assumed to be drawn independently from a normal distribution, $N_{q} (0, D)$ , the outcome $Y_{i j}$ measured on the $i$ -th subject or sample unit at the $j$ -th time point $(i = 1, \dots, N; j = 1, \dots, n_{i})$ are independent with densities of the form:

f_{i} (y_{i j} ∣ b_{i}, β, ϕ) = exp \{ϕ^{- 1} [y_{i j} η_{i j} - ψ (η_{i j})] + c (y_{i j}, ϕ)\},

(3.4)

where $η_{i j} = η (μ_{i j}) = η [E (Y_{i j} ∣ b_{i})] = x_{i j}^{T} β + z_{i j}^{T} b_{i}$ is the canonical parameter, with $x_{i j} (z_{i j})$ the design vector for the fixed (random) effects. Finally, let $f (b_{i} ∣ D)$ be the density of the Gaussian distribution, $N (0, D)$ , for the random effects $b_{i}$ .

For nominal data, it is assumed that the outcome $Y_{i j}$ can take values $r = 1, \dots, R$ . Without loss of generality, we can replace it with a set of $R$ dummy variables where $W_{r, i j}$ is equal to 1 if $Y_{i j} = r$ and 0 otherwise. Evidently, there are redundant dummies, but any subset of $R - 1$ components is not, as described in Section 3.1. Thus, $W_{i j} \sim multinomial (π_{i j})$ with probabilities $π_{i j} = {(π_{1, i j}, \dots, π_{r, i j}, \dots, π_{R, i j})}^{T}$ . Assuming that category $R$ is the reference category, a baselinecategory logit model (Agresti, 2010; Hartzel et al., 2001) can be written as

ln (\frac{π_{r, i j}}{π_{R, i j}}) = η_{r, i j} = x_{i j}^{T} β_{r} + z_{i j}^{T} b_{r, i}, r = 1, \dots, R - 1,

b_{r, i} \sim N (0, D)

where $β_{r}$ is the fixed-effects coefficient vector of length $p + 1$ , corresponding to an intercept and $p$ covariates, and $b_{r, i}$ is the random-effects vector, following a multivariate normal distribution. The probabilities of each category for the $i$ -th subject and $j$ -th time can be expressed as

π_{r, i j} = \{\frac{\frac{exp (x_{i j}^{T} β_{r} + z_{i j}^{T} b_{r, i})}{1 + \sum_{h = 1}^{R - 1} exp (x_{i j}^{T} β_{h} + z_{i j}^{T} b_{h, i})}}{1 - \sum_{h = 1}^{R - 1} π_{h, i j}} \frac{if 1 \leq r \leq R - 1,}{if r = R .}

Estimates of $β, D$ and $ϕ$ for GLMMs are obtained by maximizing the marginal likelihood, computed by integrating out the random effects and commonly written as:

L (β, D, ϕ) = \prod_{i = 1}^{N} \int_{- \infty}^{\infty} \prod_{j = 1}^{n_{i}} f_{i} (y_{i j} ∣ b_{i}, β, ϕ) f (b_{i} ∣ D) d b_{i} .

(3.5)

The key problem in maximizing (3.5) is the presence of $N$ integrals over the random effects. Here, numerical methods are needed, such as adaptive Gaussian quadrature (Molenberghs and Verbeke, 2005; Pinheiro and Bates, 1995).

While GLMMs, defined to accommodate within-unit correlation, also capture overdispersion to some extent, a single set of random effects may be insufficient to flexibly capture both. This led Molenberghs et al. (2007) to formulate a flexible and unified modelling framework, which they termed the combined model. These authors brought together two sets of random effects: the normally distributed subject-specific random effects to capture correlation and a certain amount of overdispersion, and a conjugate measurement-specific random effect on the natural parameter scale to accommodate the remaining overdispersion. Integrating out these two sets of random effects and using the generalized linear model framework, the following general family is introduced:

f_{i} (y_{i j} ∣ b_{i}, β, θ_{i j}, ϕ) = exp \{ϕ^{- 1} [y_{i j} λ_{i j} - ψ (λ_{i j})] + c (y_{i j}, ϕ)\},

(3.6)

with notation similar to the one used in (3.4), but now with conditional mean

E (Y_{i j} ∣ b_{i}, β, θ_{i j}) = μ_{i j}^{c} = θ_{i j} κ_{i j},

(3.7)

where $θ_{i j} \sim G_{i j} (ϑ_{i j}, ξ_{i j})$ is a conjugate random variable and $κ_{i j} = g (x_{i j}^{T} β + z_{i j}^{T} b_{i})$ , where $g$ is the inverse of the canonical link function (Molenberghs et al., 2010). Note that we use $g (\cdot)$ rather than $h (\cdot)$ , because this transformation applies to only a part of the mean function. Finally, as before, $b_{i} \sim N (0, D)$ . Unlike in GLMM, we now have two different notations, $η_{i j}$ and $λ_{i j}$ , to refer to the linear predictor and the natural parameter, respectively (i.e., $λ_{i j}$ encompasses $θ_{i j}$ , while $η_{i j}$ refers to the 'GLMM part' only; Molenberghs et al., 2010). Regarding $θ_{i j}$ , three assumptions can be made: (a) they are independent; (b) they are correlated, implying that the collection of univariate distributions $G_{i j} (ϑ_{i j}, ξ_{i j})$ needs to be replaced with a multivariate one; and (c) they are shared, in the sense that there is only one realization per cluster, useful in applications with exchangeable outcomes. Assumption (c) is the one adopted in the analysis of the examples. Obviously, parameterization (3.7) allows for random effects $θ_{i j}$ capturing overdispersion, while formulated directly at the mean scale. Opting for a common conjugate random effect for all logits ensures that it can be interpreted as a single overdispersion parameter.

The relationship between the mean and the natural parameter is now given by the function $h$ :

λ_{i j} = h^{- 1} (μ_{i j}^{c}) = h^{- 1} (θ_{i j} κ_{i j}),

We can still apply standard GLM ideas to derive the mean and variance, combined with iterated expectation-based calculations. For the mean, if $θ_{i j}$ and $b_{i}$ are independent, it follows that

E (Y_{i j}) = E (θ_{i j}) E (κ_{i j}) = E [h (λ_{i j})] .

Molenberghs et al. (2010) and Molenberghs et al. (2017) derived explicit expressions for the means, variances, and marginal densities for a number of outcome types, such as Gaussian, Poisson, and time-to-event data. This is not possible for binary data modelled with a logit link and including Gaussian random effects, whether or not other random effects are present.

4 Combined model for nominal outcomes

Analogously to Ivanova et al. (2014), we use a baseline-category logit structure (3.5) and include Gaussian random effects $b_{r, i} \sim N (0, D)$ in the linear predictor, as well as beta random effects $θ_{i j} \sim Beta (ϑ, ξ)$ to capture overdispersion (considering $θ_{i j}$ and $b_{r, i}$ independent). We may then write the probabilities of the proposed combined model as:

π_{r, i j} = \{\begin{array}{l} θ_{i j} κ_{r, i j} & if 1 \leq r \leq R - 1, \\ 1 - \sum_{h = 1}^{R - 1} θ_{i j} κ_{h, i j} & if r = R, \end{array}

and

κ_{r, i j} = \frac{exp (x_{i j}^{T} β_{r} + z_{i j}^{T} b_{r, i})}{1 + \sum_{h = 1}^{R - 1} exp (x_{i j}^{T} β_{h} + z_{i j}^{T} b_{h, i})} if 1 \leq r \leq R - 1,

(4.1)

where $β_{r}$ is the vector of fixed effects (regression coefficients) for each one of the $(R - 1)$ categories, and $x_{i j}$ and $z_{i j}$ are the design vectors for the fixed and random effects, respectively. We considered here the case where $θ_{i j}$ is constant across all categories, resulting in a combined model with constant overdispersion (although one may allow $θ_{i j}$ to depend on covariates through an appropriate link function).

5 Parameter estimation

Molenberghs et al. (2007) and Molenberghs et al. (2010) showed that fitting the combined model is relatively easy, and that standard software tools can be used for maximum likelihood estimation in this case. A priori, fitting a combined model of the type described in Section 4 is done by maximizing the log-likelihood while integrating over the random effects. The joint distribution of the $i j$ -th observation, assuming $θ_{i j}$ and $b_{r, i}$ independent, is given by

f (w_{i j}, b_{r, i}, θ_{i j}) = f (w_{i j} ∣ b_{r, i}, θ_{i j}) f (b_{r, i}) f (θ_{i j}),

and the likelihood function can be written as:

L (β, D, ϑ, ξ) = \prod_{i = 1}^{N} \iint \prod_{j = 1}^{n_{i}} f (w_{i j} ∣ β, b_{r, i}, θ_{i j}) f (b_{r, i} ∣ D) f (θ_{i j} ∣ ϑ, ξ) d b_{r, i} d θ_{i j} .

For our proposed model, the three functions in the integrand are, in order, the multinomial, normal and beta distributions probability density or mass functions, which yields:

\begin{array}{l} L (β, D, ϑ, ξ) = \prod_{i = 1}^{N} \iint \prod_{j = 1}^{n_{i}} \prod_{h = 1}^{R - 1} {(θ_{i j} κ_{h, i j})}^{w_{h, i j}} {(1 - \sum_{h = 1}^{R - 1} θ_{i j} κ_{h, i j})}^{1 - \sum_{h = 1}^{R - 1} w_{h, i j}} \\ \frac{1}{\sqrt{{(2 π)}^{n_{i}}}} \frac{1}{\sqrt{| D |}} \exp (- \frac{1}{2} b_{r, i}^{T} D^{- 1} b_{r, i}) \frac{θ_{i j}^{ϑ - 1} {(1 - θ_{i j})}^{ξ - 1}}{B (ϑ, ξ)} d b_{r, i} d θ_{i j} . \end{array}

(5.1)

The key problem in maximizing (5.1) is the presence of $N$ integrals over the random effects $b_{r, i}$ and $θ_{i j}$ , making this process time consuming and cumbersome to implement if the $N$ integrals are to be calculated using numerical methods. However, we can simplify by integrating analytically over the beta random effects, but not over the normal random effects, leading to a partially marginalized density. In our case, this takes the form (details of calculations are presented in Appendix A):

\begin{matrix} L (β, D, ϑ, ξ) = \prod_{i = 1}^{N} \int \prod_{j = 1}^{n_{i}} \prod_{h = 1}^{R - 1} {(\frac{ϑ}{ϑ + ξ} κ_{h, i j})}^{w_{h, i j}} {(1 - \frac{ϑ}{ϑ + ξ} \sum_{h = 1}^{R - 1} κ_{h, i j})}^{1 - \sum_{h = 1}^{R - 1} w_{h, i j}} \\ \frac{1}{\sqrt{{(2 π)}^{n_{i}}}} \frac{1}{\sqrt{|D|}} exp (- \frac{1}{2} b_{r, i}^{T} D^{- 1} b_{r, i}) d b_{r, i} \end{matrix}

Here, a generic maximum likelihood routine that allows for integration over normal random effects can be used. We follow this route and use the SAS procedure NLMIXED. We opted for the adaptive Gaussian quadrature method (Molenberghs and Verbeke, 2005) and chose the number of quadrature points $Q$ by performing a numerical sensitivity analysis to check whether it was sufficiently large. To ensure identifiability, a constraint needs to be applied. Here, we reparameterise $ϑ$ as $e^{δ} > 0$ and fix $ξ = 1$ . Therefore, larger $δ$ values correspond to weaker overdispersion.

6 Simulation

A simulation study was conducted to compare the performance of a GLMM and the proposed combined model. We simulated longitudinal nominal data with $R = 3$ categories considering a baselinecategory logit model, (3.5), and the following linear predictor:

η_{r, i j} = b_{r, i} + β_{r, 0} + β_{r, 1} {time}_{i j} + β_{r, 2} {group}_{i} + β_{r, 3} {time}_{i j} {*group}_{i},

where ${time}_{i j} = (j - 1) / 6$ for $j = 1, \dots, 6$ and ${group}_{i} = 0$ or 1. The true parameter values were set as:

\begin{array}{l} β_{1} = {(β_{1, 0}; β_{1, 1}; β_{1, 2}; β_{1, 3})}^{T} = {(0.1; 0.2; 0.5; 0.7)}^{T} \\ β_{2} = {(β_{2, 0}; β_{2, 1}; β_{2, 2}; β_{2, 3})}^{T} = {(0.2; 0.1; 0.4; 0.5)}^{T} \end{array}

Finally, the random effects were specified as:

\begin{array}{r} b_{r, i} & \sim N (0, D), \\ D & = (\begin{matrix} d_{1} & c \\ c & d_{2} \end{matrix}), \end{array}

with $c = 0.5, d_{1} \in \{1, 9\}$ and $d_{2} \in \{0.5, 4.5\}$ , used to generate 'weak' and 'strong' (nine times higher) correlation values for the simulated datasets.

We simulated 200 datasets with $N = 300$ and $N = 600$ , and $n_{i} = 6, \forall i$ , with two groups of equal size each, that is, $N = 150$ and $N = 300$ experimental/observational units per group, respectively. Six scenarios with different magnitudes of random effects and overdispersion were generated to compare the behavior of the GLMM and the CM (Table 2).

Table 2

True parameter values used to specify six different scenarios (S1 - S6) used in the simulation study.

Scenario	Size	Variance components	Overdispersion
S1		$d_{1} = 1.0, d_{2} = 0.5$	-
S2		$d_{1} = 9.0, d_{2} = 4.5$	-
S3		$d_{1} = 1.0, d_{2} = 0.5$	$ϑ = 5, ξ = 1$
S4	$N = 300$ or 600	$d_{1} = 1.0, d_{2} = 0.5$	$ϑ = 20, ξ = 1$
S5		$d_{1} = 9.0, d_{2} = 4.5$	$ϑ = 5, ξ = 1$
S6		$d_{1} = 9.0, d_{2} = 4.5$	$ϑ = 20, ξ = 1$

To generate data with overdispersion, the simulated probabilities were multiplied by values generated from a beta distribution with the shape parameters specified to 'strongly' $(ϑ = 5)$ or 'weakly' $(ϑ = 20)$ disturb the probabilities generated by the model, while fixing $ξ = 1$ (Table 3).

Table 3

Descriptive statistics of the 200 simulated overdispersion values.

Beta parameters	Min.	1st Quartile	Median	Mean	3rd Quartile	Max.
$ϑ = 5$ and $ξ = 1$	0.06	0.76	0.87	0.83	094	1
$ϑ = 20$ and $ξ = 1$	0.48	0.93	0.97	0.95	0.98	1

The GLMM and CM were fitted to the simulated datasets using the estimation method described in Section 5, which was implemented using SAS procedure NLMIXED. We approximated the integrals using adaptive Gaussian quadrature with 10 quadrature points, and optimized the loglikelihood using the quasi-Newton BFGS method. We used, as starting values for the fixed effects, the estimates obtained by fitting a GLM without random effects. For each scenario, we computed the average estimate (AE), bias, and mean squared error (MSE) for each parameter. We set $ϑ = e^{δ}$ and $ξ = 1$ for the combined model to ensure identifiability.

In general, the simulation results indicate the estimation procedure produced reliable results, showing that the MSEs of the maximum likelihood estimators of the parameters decay towards zero as the sample size increases, as expected under standard asymptotic theory. For scenarios S1, S2, and S4 (weak overdispersion and/or correlation), the results for both the GLMM and CM are similar (see Table 4, which displays results for scenario $S 2$ ).

Table 4

Average estimate (AE), bias and mean square error (MSE) for the parameters estimated by the GLMM and CM based on 200 simulations for scenario S2.

Parameter	True	GLMM
		$N = 300$			$N = 600$
		AE	Bias	MSE	AE	Bias	MSE
$β_{1,0}$	0.1	0.123	0.023	0.118	0.112	0.012	0.057
$β_{1,1}$	0.2	0.196	-0.004	0.185	0.219	0.019	0.076
$β_{1,2}$	0.5	0.524	0.024	0.242	0.463	-0.037	0.127
$β_{1,3}$	0.7	0.689	-0.011	0.351	0.692	-0.008	0.153
$β_{2,0}$	0.2	0.233	0.033	0.086	0.193	-0.007	0.039
$β_{2,1}$	0.1	0.072	-0.028	0.144	0.149	0.049	0.061
$β_{2,2}$	0.4	0.402	0.002	0.130	0.428	0.028	0.077
$β_{2,3}$	0.5	0.478	-0.022	0.297	0.458	-0.042	0.151
d ₁	9.0	9.220	0.220	2.921	9.055	0.055	1.303
d ₂	4.5	4.558	0.058	0.682	4.462	-0.038	0.307
c	0.5	0.616	0.116	0.635	0.570	0.070	0.341
Parameter	True	CM
		$N = 300$			$N = 600$
		AE	Bias	MSE	AE	Bias	MSE
$β_{1,0}$	0.1	0.136	0.036	0.128	0.119	0.019	0.059
$β_{1,1}$	0.2	0.195	-0.005	0.187	0.221	0.021	0.077
$β_{1,2}$	0.5	0.572	0.027	0.244	0.466	-0.034	0.127
$β_{1,3}$	0.7	0.694	-0.006	0.355	0.694	-0.006	0.155
$β_{2,0}$	0.2	0.245	0.045	0.090	0.200	0.001	0.039
$β_{2,1}$	0.1	0.072	-0.028	0.145	0.150	0.050	0.062
$β_{2,2}$	0.4	0.405	0.005	0.132	0.431	0.031	0.078
$β_{2,3}$	0.5	0.483	-0.017	0.300	0.459	-0.041	0.153
$d_{1}$	9.0	9.288	0.288	2.747	9.110	0.110	1.364
$d_{2}$	4.5	4.589	0.089	0.687	4.481	-0.019	0.323
$c$	0.5	0.654	0.154	0.650	0.600	0.100	0.346
$δ$	-	9.482	-	-	9.188	-	-

However, if there is a pronounced overdispersion effect (S3) or if it coincides with high correlations (S5 and S6), better performances were observed for the CM, mainly for the variance components (Table 5 displays results for scenario S6). Even with a larger sample size, the predicted random effects for the CM showed smaller bias and MSE when compared to the GLMM.

Table 5

Average estimates (AE), bias and mean square errors (MSE) for the parameters estimated by the GLMM and CM based on 200 simulations for scenario S6.

Parameter	True	GLMM
		$N = 300$			$N = 600$
		AE	Bias	MSE	AE	Bias	MSE
$β_{1,0}$	0.1	-0.251	-0.351	0.215	-0.276	-0.376	0.182
$β_{1,1}$	0.2	0.149	-0.051	0.130	0.180	-0.020	0.072
$β_{1,2}$	0.5	0.380	-0.120	0.188	0.376	-0.124	0.105
$β_{1,3}$	0.7	0.524	-0.176	0.310	0.501	-0.199	0.201
$β_{2,0}$	0.2	-0.207	-0.407	0.228	-0.228	-0.428	0.220
$β_{2,1}$	0.1	0.068	-0.032	0.128	0.092	-0.008	0.064
$β_{2,2}$	0.4	0.293	-0.107	0.145	0.335	-0.065	0.064
$β_{2,3}$	0.5	0.353	-0.147	0.282	0.339	-0.161	0.156
$d_{1}$	9.0	6.297	-2.703	8.722	6.294	-2.706	7.916
$d_{2}$	4.5	3.509	-0.991	1.419	3.517	-0.983	1.153
c	0.5	-0.622	-1.122	1.559	-0.639	-1.139	1.455
Parameter	True	CM
		$N = 300$			$N = 600$
		AE	Bias	MSE	AE	Bias	MSE
$β_{1,0}$	0.1	0.115	0.015	0.169	0.058	-0.042	0.074
$β_{1,1}$	0.2	0.186	0.014	0.147	0.215	0.015	0.070
$β_{1,2}$	0.5	0.507	0.007	0.178	0.474	-0.026	0.107
$β_{1,3}$	0.7	0.688	-0.012	0.329	0.669	-0.031	0.199
$β_{2,0}$	0.2	0.201	0.001	0.117	0.141	-0.059	0.068
$β_{2,1}$	0.1	0.091	-0.009	0.122	0.117	0.017	0.054
$β_{2,2}$	0.4	0.398	-0.002	0.154	0.419	0.019	0.070
$β_{2,3}$	0.5	0.495	-0.005	0.284	0.483	-0.017	0.157
$d_{1}$	9.0	8.966	-0.034	4.965	8.636	-0.364	2.158
$d_{2}$	4.5	4.435	-0.065	1.056	4.344	-0.156	0.466
$c$	0.5	0.547	0.047	1.137	0.370	-0.130	0.505
$δ$	-	3.153	-	-	3.172	-	-

We present the convergence rates for the two models when 200 datasets were simulated in Table 6. The proportion of datasets for which convergence was achieved is a little bit smaller in the CM than in the GLMM. This can be attributed to sensitivity to the starting values, which points to the need for careful selection. When convergence problems arise, we suggest to start the analysis with the GLM or GLMM estimates and, if necessary, to attempt to fit the CM using different sets of starting values.

Table 6

Convergence rates for the GLMM and the CM in six simulated scenarios with 200 datasets.

	Model	Size	Scenario
	Model	Size	S1	S2	S3	S4	S5	S6
Rate (%)	GLMM	300	98.5	100.0	97.3	98.5	100.0	100.0
		600	100.0	100.0	99.5	100.0	100.0	100.0
	CM	300	97.5	100.0	91.0	95.5	100.0	100.0
		600	100.0	100.0	99.0	100.0	100.0	99.5

For a particular application that a researcher envisages, it might be useful to conduct a targeted simulation study to assess the convergence rate for a sample size that is envisaged.

7 Analysis of the grazing management data

Here, we present an analysis of the grazing management data, introduced in Section 3.2. This dataset has been previously analysed by Menarin and Lara (2017), through extended generalized estimating equations that use local odds ratios to explain the dependence among the categories (Touloumis et al., 2013). Here, emphasis was placed on a subject-specific interpretation. Let $Y_{i j} = 1,2, 3$ be the types of vegetation (weed, bare ground, and tussock, respectively) that target point $i, (i = 1, \dots, 640)$ , reached at season $j, (j = 1, \dots, 6)$ . Thus, under a baseline-category logit model, (3.5), the GLMM and the CM can be written as:

logit 1 : ln (\frac{π_{1, i j}}{π_{3, i j}}), logit 2 : ln (\frac{π_{2, i j}}{π_{3, i j}}),

where $π_{r, i j}$ is the probability of the $i$ -th point being classified in the $r$ -th category in season $j$ . The first logit is a log-odds between weeds and tussocks and the second one is the log-odds between bare ground and tussocks. As before, both models were fitted with the SAS procedure NLMIXED using adaptive Gaussian quadrature. We performed a sensitivity analysis by increasing the number of quadrature points up to 5, when the estimates showed stability, and carried out maximization through the quasi-Newton BFGS method. We began with the complete model, including the fixed effects of blocks, pre- and post-grazing conditions, seasons, as well as all two- and three-way interactions between pre- and post-grazing conditions and seasons. We also included a random intercept per point within paddock.

We then performed backwards selection for the fixed effects, by fitting reduced models which did not include higher-order interactions, and carrying out likelihood-ratio tests until the model only included significant interactions and/or main effects. This process yielded the following linear predictor:

\begin{matrix} η_{r, i j} = b_{i} + β_{r, 0} + \overset{blocks}{\overset{︷}{β_{r, 1} X_{11 i} + β_{r, 2} X_{12 i} + β_{r, 3} X_{13 i}}} + \overset{pre-grazing}{\overset{︷}{β_{r, 4} X_{2 i}}} \\ \overset{seasons}{\overset{︷}{+ β_{r, 5} X_{31 i} + β_{r, 6} X_{32 i} + β_{r, 7} X_{33 i} + β_{r, 8} X_{34 i} + β_{r, 9} X_{35 i}}} \\ \overset{pre-grazing \times seasons}{\overset{︷}{+ β_{r, 10} X_{2 i} X_{31 i} + \dots + β_{r, 14} X_{2 i} X_{35 i}}}, \end{matrix}

where $b_{i} \sim N (0, d)$ , and the $X$ variables are dummy covariates for blocks $(X_{11 i}, \dots, X_{13 i})$ , pregrazing management $(X_{2 i})$ and seasons $(X_{31 i}, \dots, X_{35 i})$ . To ensure identifiability, we take the last level of each covariate as reference (Block $4 = 0$ , Pre:maximum $= 0$ and Summer2 $= 0)$ and for the CM we set $ϑ = e^{δ}$ and $ξ = 1$ .

The results of both fitted models are presented in Table 7. The estimates are very similar, but there is a reduction in the variance component for the $CM$ , a pronounced value of the overdispersion parameter $(δ = 1.374)$ and also a clear improvement in terms of the log-likelihood. Note that several parameters have shifted a bit in the CM model relative to the GLMM model. This is to be expected to some extent, because the more flexible way in which the CM handles variability and within-unit correlation, combined with the mean-variance link, implies that a change in parameter estimates may occur. Against this background, the shift in the logit 2 intercept is of the same magnitude as that in logit 1, but coincidentally it slides across the zero value.

Table 7

Grazing management data. Parameter estimates (standard errors) from the regression coefficients in the GLMM and CM. Estimation was done by maximum likelihood using numerical integration over the normal and beta random effects, if present.

Effects	Par.	GLMM		CM
Effects	Par.	logit 1	logit 2	logit 1	logit 2
Intercept	$β_{r, 0}$	-2.826(0.332)	-0.053(0.129)	-2.467(0.392)	$0.325 (0.247)$
Block 1	$β_{r, 1}$	0.623(0.190)	0.117(0.099)	0.736(0.198)	0.220(0.129)
Block 2	$β_{r, 2}$	0.567(0.190)	-0.024(0.098)	0.683(0.193)	0.032(0.124)
Block 3	$β_{r, 3}$	-0.681(0.252)	0.072(0.097)	-0.633(0.248)	0.169(0.120)
Pre(95%)	$β_{r, 4}$	0.725(0.364)	-0.605(0.168)	0.936(0.379)	-0.602(0.209)
Summer1	$β_{r, 5}$	0.514(0.370)	-0.810(0.170)	0.580(0.390)	-0.677(0.211)
Autumn	$β_{r, 6}$	-0.309(0.444)	-0.345(0.162)	-0.199(0.453)	-0.342(0.205)
Winter	$β_{r, 7}$	-0.136(0.426)	-0.364(0.163)	-0.044(0.438)	-0.352(0.206)
Early spring	$β_{r, 8}$	0.308(0.403)	-0.286(0.169)	0.286(0.432)	-0.217(0.216)
Late spring	$β_{r, 9}$	0.969(0.365)	-0.082(0.163)	1.029(0.395)	-0.023(0.218)
Pre $(95 %) \times$ summer1	$β_{r, 10}$	-0.389(0.464)	0.898(0.243)	-0.347(0.487)	0.930(0.305)
Pre(95%) × autumn	$β_{r, 11}$	0.346(0.525)	0.279(0.237)	0.397(0.532)	0.274(0.295)
Pre( $95 %) \times$ winter	$β_{r, 12}$	-0.701(0.551)	0.429(0.236)	-0.633(0.547)	0.397(0.290)
Pre(95%) × early spring	$β_{r, 13}$	-0.670(0.504)	0.120(0.243)	-0.697(0.530)	0.133(0.300)
Pre $(95 %) \times$ late spring	$β_{r, 14}$	-1.678(0.496)	0.147(0.237)	-1.632(0.513)	0.132(0.301)
Random effect	$d$	0.020(0.042)		0.013(0.062)
Overdispersion	$δ$	-		1.374(0.437)
-2loglik		6602.7		6590.9
AIC		6664.7		6654.9
BIC		6803.0		6797.7

To compare the models, the likelihood-ratio test was used. The difference between deviances is 11.8; however, since this test is carried out on the boundary of the parametric space, the reference distribution is a mixture of $χ^{2}$ distributions (Stram and Lee, 1994; Self and Liang, 1987). To test the hypothesis $H_{0} : θ_{i j} = 1$ , the reference distribution is a 50:50 mixture of a $χ_{0}^{2}$ (the degenerate chi-squared distribution at 0) and $χ_{1}^{2}$ , often denoted as $χ_{0 : 1}^{2}$ . Thus, we obtain $p = P (χ_{0 : 1}^{2} \geq 11.8) =$ $0.5 P (χ_{0}^{2} \geq 11.8) + 0.5 P (χ_{1}^{2} \geq 11.8) = 0.0003$ , showing that the inclusion of the overdispersion parameter was important.

Note that, when using the GEE approach in Menarin and Lara (2017), evidence of significant post-grazing management effect was reported, while here neither the GLMM nor CM confirmed this effect. For this experiment, overdispersion is likely to happen since it is a field experiment that can suffer from several environmental changes, and also because some types of vegetation can occur in an aggregate pattern inside paddocks.

8 Concluding remarks

In this article, we have proposed a model for overdispersed, repeated nominal data. The model combines the baseline-category logit assumption to handle the nominal nature of the outcome, with normal random effects in the linear predictor to deal with correlation across repeated measures, and beta random effects to flexibly account for overdispersion. Similar models were proposed by Molenberghs et al. (2007), Molenberghs et al. (2010), Molenberghs et al. (2012), Ivanova et al. (2014), and Molenberghs et al. (2017) for count data, binary and binomial data, time-to-event and ordinal outcomes. The model is easy to formulate and can be fitted using, for example, the SAS procedure NLMIXED.

A simulation study was conducted to examine the behaviour of the combined model relative to the more conventional GLMM. Both models performed well, but when there is a pronounced effect of overdispersion, or if the overdispersion effect is associated with high correlations between the repeated measurements, better performance was observed for the CM, mainly for the variance components.

We applied the GLMM and the CM to agricultural experimental data to model the probability of occurrence of three types of vegetation. Comparing both models, evidence is found in favour of the CM. It means that besides the parameter to take into account the correlation between measures, an extra overdispersion parameter was useful to accommodate the extra-variability induced by the environmental and biological changes.

Appendix A: Algebraic developments for the CM

The partially marginalized density function of the combined model was obtained by integrating analytically over the beta random effects, leaving the normal random effects untouched. To do this, we need to consider the category to which the outcome belongs in order to proceed with the integration over the beta random effect. To simplify notation, let us consider the case where three categories are analysed. We can rewrite this expression as

\begin{matrix} f (w_{r, i j} ∣ b_{r, i}) = \int {(θ_{i j} κ_{1, i j})}^{w_{1, i j}} {(θ_{i j} κ_{2, i j})}^{w_{2, i j}} \\ {(1 - θ_{i j} κ_{1, i j} - θ_{i j} κ_{2, i j})}^{1 - w_{1, i j} - w_{2, i j}} \frac{θ_{i j}^{ϑ - 1} {(1 - θ_{i j})}^{ξ - 1}}{B (ϑ, ξ)} d θ_{i j} . \end{matrix}

Thus, if the outcome belongs to the first category, that is, $W_{r, i j}$ is equals to 1 if $Y_{i j} = 1$ and 0 otherwise, the following expression is obtained:

\begin{array}{l} f (w_{1, i j} = 1 ∣ b_{r, i}) = \int_{0}^{1} {(θ_{i j} κ_{1, i j})}^{1} \frac{θ_{i j}^{ϑ - 1} {(1 - θ_{i j})}^{ξ - 1}}{B (ϑ, ξ)} d θ_{i j} \\ = \frac{κ_{1, i j}}{B (ϑ, ξ)} \int_{0}^{1} θ_{i j}^{(ϑ - 1) + 1} {(1 - θ_{i j})}^{ξ - 1} d θ_{i j} \end{array}

\begin{array}{l} = κ_{1, i j} \frac{B (ϑ + 1, ξ)}{B (ϑ, ξ)} \\ = κ_{1, i j} \frac{Γ (ϑ + 1) Γ (ξ)}{Γ (ϑ + ξ + 1)} \frac{Γ (ϑ + ξ)}{Γ (ϑ) Γ (ξ)} \\ = κ_{1, i j} ϑ \frac{Γ (ϑ) Γ (ξ)}{Γ (ϑ + ξ + 1)} \frac{Γ (ϑ + ξ)}{Γ (ϑ) Γ (ξ)} \\ = κ_{1, i j} ϑ \frac{Γ (ϑ + ξ)}{(ϑ + ξ) Γ (ϑ + ξ)} \\ = κ_{1, i j} \frac{ϑ}{ϑ + ξ} . \end{array}

Similar results apply to r = 2, but multiplied, of course, by their respective κ. For the last category $(r = 3)$ , the expression is given by:

\begin{matrix} f (w_{3, i j} = 1 ∣ b_{r, i}) = \int_{0}^{1} {(1 - θ_{i j} κ_{1, i j} - θ_{i j} κ_{2, i j})}^{1} \frac{θ_{i j}^{ϑ - 1} {(1 - θ_{i j})}^{ξ - 1}}{B (ϑ, ξ)} d θ_{i j} \\ = \int_{0}^{1} \frac{θ_{i j}^{ϑ - 1} {(1 - θ_{i j})}^{ξ - 1}}{B (ϑ, ξ)} \\ - (θ_{i j} κ_{1, i j}) \frac{θ_{i j}^{ϑ - 1} {(1 - θ_{i j})}^{ξ - 1}}{B (ϑ, ξ)} - (θ_{i j} κ_{2, i j}) \frac{θ_{i j}^{ϑ - 1} {(1 - θ_{i j})}^{ξ - 1}}{B (ϑ, ξ)} d θ_{i j} \\ = 1 - κ_{1, i j} \frac{ϑ}{ϑ + ξ} - κ_{2, i j} \frac{ϑ}{ϑ + ξ} \\ = 1 - \frac{ϑ}{ϑ + ξ} (κ_{1, i j} + κ_{2, i j}) . \end{matrix}

Hence, the partially marginalized likelihood function of the combined model considering three categories is given by:

\begin{matrix} L (β, D, ϑ, ξ) = \prod_{i = 1}^{N} \int \prod_{j = 1}^{n_{i}} {(\frac{ϑ}{ϑ + ξ} κ_{1, i j})}^{w_{1, i j}} {(\frac{ϑ}{ϑ + ξ} κ_{2, i j})}^{w_{2, i j}} {(1 - \frac{ϑ}{ϑ + ξ} \sum_{h = 1}^{R - 1} κ_{h, i j})}^{1 - w_{1, i j} - w_{2, i j}} \\ \frac{1}{\sqrt{{(2 π)}^{n_{i}}}} \frac{1}{\sqrt{|D|}} exp (- \frac{1}{2} b_{r, i}^{T} D^{- 1} b_{r, i}) d b_{r, i} . \end{matrix}

Supplementary materials

Supplementary materials for this article is available online.

Supplemental Material for A combined overdispersed longitudinal model for nominal data by Ricardo K. Sercundes, Geert Molenberghs, Geert Verbeke, Clarice G.B. Demétrio, Sila C. da Silva, and Rafael A. Moral, in Statistical Modelling

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

RKS was supported by CAPES and CNPq (proc. no. 233554/2014-9), Brazil. CGBD and SCS were supported by CNPq, Brazil.

References

Agresti

(2010) Categorical data analysis . Wiley, New York.

Clayton

(1992) Repeated ordinal measurements: a generalised estimating equation approach. Medical Research Council Biostatistics Unit Technical Report , pages 1–11.

Demétrio

CGB

, Hinde

and Moral

(2014) Models for overdispersed data in entomology. In Ecological modeling applied to entomology . Springer.

Diggle

, Heagerty

, Liang

and Zeger

(2002) Analysis of longitudinal data . Oxford University Press, New York.

Grunwald

, Bruce

, Jiang

, Strand

and Rabinovitch

(2011) A statistical model for under- or overdispersed clustered and longitudinal count data. Biometrical Journal , 53, 578–594.

Hartzel

, Agresti

and Caffo

(2001) Multinomial logit random effects models. Statistical Modelling , 1, 81–102.

Hedeker

(2003) A mixed-effects multinomial logistic regression model. Statistics in Medicine , 1446, 1433–1446.

Hinde

and Demétrio

CGB

(1998) Overdispersion: Models and estimation. Computational Statistics and Data Analysis , 27, 151–170.

Ivanova

, Molenberghs

and Verbeke

(2014) A model for overdispersed hierarchical ordinal data. Statistical Modelling , 14, 399–415.

10.

Lara

IARD

, Hinde

, Castro

ACD

and da Silva

IJO

(2017) A proportional odds transition model for ordinal responses with an application to pig behaviour. Journal of Applied Statistics , 4763, 1031–1046.

11.

Lee

and Nelder

(1996) Hierarchical generalized linear models (with discussion). Journal of the Royal Statistical Society, Series B , 58, 619–678.

12.

Lee

and Nelder

(2001) Hierarchical generalized linear models: a synthesis of generalized linear models, random-effect models and structured dispersions. Biometrika , 88, 9871006.

13.

Lee

and Nelder

(2003) Extended-reml estimators. Applied Statistics , 30, 845–856.

14.

Liang

and Zeger

(1986) Longitudinal data analysis using generalized linear models. Biometrika , 73, 13–22.

15.

Lipsitz

, Kim

and Zhao

(1994) Repeated categorical data using generalized estimating equations. Statistics in Medicine , 13, 11491163.

16.

McCullagh

and Nelder

(1983) Generalized linear models . Chapman & Hall, London, 1 edition.

17.

Menarin

and Lara

IAR

(2017) Longitudinal model for categorical data applied in an agriculture experiment about elephant grass. Scientia Agricola , 74, 265–274.

18.

Molenberghs

and Verbeke

(2005) Models for discrete longitudinal data . Springer-Verlang, New York.

19.

Molenberghs

, Verbeke

and Demétrio

CGB

(2017) Hierarchical models with normal and conjugate random effects: a review. SORT Statistics and Operations Research Transactions , 41, 191–254.

20.

Molenberghs

, Verbeke

and Demétrio

CGB

(2007) An extended random-effects approach to modeling repeated, overdispersed count data. Lifetime Data Analysis , pages 513531.

21.

Molenberghs

, Verbeke

, Demétrio

CGB

and Vieira

AMC

(2010) A family of generalized linear models for repeated measures with normal and conjugate random effects. Statistical Science , 25, 325–347.

22.

Molenberghs

, Verbeke

, Iddi

and Demétrio

CGB

(2012) A combined beta and normal random-effects model for repeated, overdispersed binary and binomial data. Journal of Multivariate Analysis , 111, 94–109.

23.

Morel

and Koehler

(1995) A one-step Gauss-Newton estimator for modelling categorical data with extraneous variation. Journal of the Royal Statistical Society. Series C (Applied Statistics) , 44, 187–200.

24.

Morel

and Nagaraj

(1993) A finite mixture distribution for modelling multinomial extra variation. Biometrika , 80, 363–371.

25.

Morris

and Sellers

(2022) A flexible mixed model for clustered count data. Stats , 5, 5269.

26.

Mosimann

(1962) On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika , 49, 65–82.

27.

Neerchal

and Morel

(1998) Large cluster results for two parametric multinomial extra variation models. Journal of the American Statistical Association , 93, 10781087.

28.

Nelder

and Wedderburn

RWM

(1972). Generalized linear models. Journal of the Royal Statistical Society Series A , 135, 370–384.

29.

Pereira

LET

, Paiva

, Geremia

and Silva

(2015a) Grazing management and tussock distribution in elephant grass. Grass and Forage Science , 70, 406–417.

30.

Pereira

LET

, Paiva

, Geremia

and Silva

(2015b) Regrowth patterns of elephant grass (Pennisetum purpureum Schum) subjected to strategies of intermittent stocking manage- ment. Grass and Forage Science , 70, 195204.

31.

Pinheiro

and Bates

(1995) Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics , 4, 12–35.

32.

Ribeiro

Jr , Zeviani

, Bonat

, Demétrio

and Hinde

(2020) Reparametrization of compoisson regression models with applications in the analysis of experimental data. Statistical Modelling , 20, 443–466.

33.

Self

and Liang

(1987) Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. Journal of the American Statistical Association , 82, 605–610.

34.

Stram

and Lee

(1994) Variance Components Testing in the Longitudinal Mixed Effects Model. Biometrics , 50, 1171–1177.

35.

Touloumis

, Agresti

and Kateri

(2013) GEE for Multinomial Responses Using a Local Odds Ratios Parameterization. Biometrics , 69, 633–640.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.06 MB