Sage Journals: Discover world-class research

Abstract

Motivated by measurement errors in radiographic diagnosis of osteoarthritis, we propose a Bayesian approach to identify latent classes in a model with continuous response subject to a monotonic, that is, non-decreasing or non-increasing, process with measurement error. A latent class linear mixed model has been introduced to consider measurement error while the monotonic process is accounted for via truncated normal distributions. The main purpose is to classify the response trajectories through the latent classes to better describe the disease progression within homogeneous subpopulations.

Keywords

Bayesian analysis disease trajectories latent class linear mixed models measurement error monotonic continuous process

1. Introduction

Osteoarthritis (OA) is a chronic debilitating joint disease with no cure and a global prevalence of $\approx$ 654 million people age 40 and over disproportionately affecting women and overweight people.¹ Some authors point out that the economic and societal burden of the disease is rising worldwide fueled primarily by the aging population.² Current treatments are limited and may be inadequate or unavailable for a significant portion of patients, who have to cope with a deteriorating quality of life including lasting pain.

Given the increasing number of patients and rising costs, it is paramount to differentiate groups of patients according to their disease performance over time. In statistical terms, this differentiation can be achieved by identifying trajectory groups (or classes) according to their longitudinal trends. Latent class linear mixed models (LCLMMs)^3,4 provide a useful statistical approach to identify such classes by defining subpopulations that could in turn be linked to differences in other individual characteristics. Identifying subpopulations with distinct disease trajectories will aid in determining a more strategical use of resources for OA management. For instance, patients classified with a ‘stable non-deteriorating’ trajectory could be minimally treated or maintenance approaches can be sought whereas patients with a ‘rapid-progressing’ trajectory could be targeted for novel interventions or followed up more regularly.

The LCLMM combines attributes of the linear mixed model (LMM) with a categorical latent variable, that is, a latent class, which allows splitting a heterogeneous population with different trajectories over time into subpopulations with more homogeneous observations. Briefly, the LMM explains dependency or correlation among repeated outcomes of individuals, for example, individual trajectories over time for longitudinal data, by incorporating in the formulation a set of covariates to determine their possible fixed and random effects. In LMMs, the fixed effects account for the overall impact of specified covariates whereas the random effects account for variation within and between groups.⁵ At the same time, different classes of latent trajectories aim to capture heterogeneity in a population with different growth curves shapes. LCLMM combines the underlying subpopulations through the use of a finite mixture.

OA is most commonly presented in the knee joints and radiography, that is, x-rays, is typically used to diagnose structural changes. Kellgren and Lawrence⁶ introduced a radiographic grading system which became widely used to assess the severity of knee OA, disease. The Kellgren–Lawrence (KL) score is a discrete grade ranging from 0 to 4, which indicates increasing OA severity. The KL grading scales result from a classification system involving four radiographic features: joint space width, osteophyte formation, presence of sclerotic walls and bone deformity. Of these features, joint space width (JSW) and its narrowing over time is likely to capture disease progression with greater granularity compared to the other features denoting the presence/absence of a given characteristic.⁷

However, errors in accurately measuring JSW can be introduced in the diagnostic process due to the grader, machine, or image angle. Thus, modelling approaches aiming to describe the disease trajectories based on JSW should consider these measurement errors into account as these may raise different problems. For example, Carroll et al.⁸ mentioned that measurement error can cause bias in parameter estimation, lead to a loss of power for detecting relationships among variables; and mask data features. Therefore, the measurement error problem could be tackled using models with a general common structure, or ‘ingredients of a measurement error problem’⁹: a model for the true value (a statistical model); a measurement error model (a specification of the relationship between the true and observed values); and extra data (information or assumptions needed to correct for measurement error, e.g. constraints or informative prior distributions for the measurement error parameters). To address this challenge for modelling JSW, we extend here the use of monotonic constraints (as the ‘extra data’ ingredient), which have been previously used for hidden Markov models,^10,11 in the context of LCLMM for data arising from a heterogeneous population subject to measurement errors.

To be specific, we propose a Bayesian approach to identify latent classes based on the trajectories of longitudinal continuous response data subject to monotonicity and measurement error. The novelty of the LCLMM proposal is twofold as it (1) preserves monotonic patterns and (2) addresses measurement errors in the continuous responses. The main purpose of incorporating these features is to aid the latent class identification that best describe different progression trends between subpopulations.

The remaining of this article is organized as follows. The motivating data is introduced in Section 2. The proposed approach is presented in Section 3, which introduces a LCLMM subject to a monotonic constraint and measurement error. Bayesian analysis is explored in Section 4. Analysis results for a simulation study and motivating data are presented in Section 5. Discussion and concluding remarks can be found in Section 6.

2. Motivating data

The Osteoarthritis Initiative OA initiative (OAI) is a cohort study which started in 2002 across multiple centers in the United States with the objective of pinpointing risk factors and biomarkers for knee OA. A total of 4796 participants were recruited into one of three subcohorts: progression, incidence, and control according to distinct inclusion criteria and followed up for almost 10 years.¹² Briefly, the progression subcohort ( $\approx 29$ %) is comprised by participants with symptomatic radiographic knee OA whereas the incidence subcohort ( $\approx 68$ %) comprises participants with a higher risk of developing symptomatic radiographic knee OA based on clinicodemographic factors. The control subcohort ( $\approx 3$ %) is comprised by a limited number of participants with no risk factors nor symptomatic radiographic knee OA. Rich data collection including biospecimen (e.g. urine, serum and plasma), imaging (x-rays and magnetic resonance imagings), and self-administered questionnaire information was performed over time on at most six occasions in addition to baseline (every year from years 1 to 4 and every 2 years thereafter). The OAI has been approved by the Institutional Review Board for the University of California, San Francisco (FWA approval # 00000068; IRB approval # 10-00532), and its affiliates. The following datasets were used for analysis purposes in this article: XRay[00-10], AllClinical[00-10], kxr_sq_bu[00-10], kxr_qjsw_duryea[00-10], and Enrollees, where the numbers in square brackets denote available visits in the OAI, that is, 00, 01, $\dots$ , 10. All participants provided written informed consent to the OAI. Further study details can be found on the OAI website (https://nda.nih.gov/oai/study-details.html).

As previously mentioned, our main interest lies in identifying trajectory groups based on the reported medial minimum JSW (MCMJSW) measured in millimeters, which is subject to measurement error and follows a non-increasing process due to the chronic and progressive nature of OA. Here, we focus on a subset of participants in the progression subcohort with complete information across visit times ( $n = 505$ ). The response of interest is the total, that is, the sum, MCMJSW across both left and right knees as a measure of overall knee structural state, the subjects’ trajectories are captured in Figure 1. Covariates of interest included baseline features such as age and biological sex as well as time-varying features such as body mass index (BMI) and the maximum across knees of the Western Ontario and McMaster Universities Arthritis Index (WOMAC) total score. Of note, the WOMAC questionnaire (one of the various self-administered questionnaires available) consists of 24 items divided into three subscales capturing pain, stiffness and physical function. WOMAC total score ranges between 0 and 96 with 0 denoting best health status and 96 denoting the worst possible status.¹³ Baseline summaries of the response and covariates of interest are presented in Table 1, the studied participants represent a relatively older population (mean age 59.7 years), with a good balance of females and males (53% females) and increased BMI (mean 29.7 kg/m²). Lastly, the average WOMAC total score was 23.5 which is an indication of a mildly deteriorated health status.

Figure 1.

Spaghetti plots for the subjects’ trajectories ( $n = 505$ ) in total (i.e. across both right and left knees) medial minimum joint space width (MCMJSW) measured in millimeters (mm).

Table 1.

Summary of the response and covariate features Across 505 subjects at baseline. Continuous variables are summarized with means, standard deviations, medians, minima and maxima. Categorical variables are summarized with counts and percentages.

Feature	$n = 505$
Total MCMJSW	Mean (sd)	Median (Min, Max)
	8.2 (2.3)	8.2 (2.1, 16.0)
Age	Mean (sd)	Median (Min,Max)
	59.7 (8.8)	59 (45, 79)
Biological Sex	Female	Male
	270 (53)	235 (47)
Body mass index (BMI)	Mean (sd)	Median (Min,Max)
	29.7 (4.7)	29.4 (18.2, 44.2)
WOMAC total score (max)	Mean (sd)	Median (Min,Max)
	23.5 (16.9)	20.9 (0.0, 85.0)

MCMJSW: medial minimum joint space width

3. The approach

The aim of the proposed approach is to identify latent classes that characterize the response trajectories. We consider a continuous latent monotonic non-increasing time-dependent process, which underlies an observed continuous response subject to measurement errors. The description of the approach is given in the following subsections.

3.1. Latent class linear mixed models

A LCLMM is a mixture of $K$ LMMs, where each one of the $K$ LMMs corresponds to one latent class. In other words, the model assumes that the population is heterogeneous and composed of $K$ groups of subjects characterized by $K$ mean trajectory profiles.

For $i$ th subject, let $c_{i k}$ be an indicator variable denoting whether subject $i$ belongs to class $k$ , for $i = 1, \dots, N$ and $k = 1, \dots, K,$ specifically,

c_{i k} = {\begin{cases} 1 & if subject i is a member of class k, \\ 0 & if subject i is not a member of class k, and \end{cases}

where

c_{i} = (c_{i 1}, \dots, c_{i K})^{'} \sim Multinomial (1, π_{i}^{'})

with corresponding probabilities

π_{i} = (π_{i 1}, \dots, π_{i K})^{'}

. Of note, each subject belongs to only one latent class, thus,

c_{i}

is an element of the canonical basis in

R^{K}

. The probabilities

π_{i k}

for each latent class are given by the multinomial logistic model:

π_{i k} = P (c_{i k} = 1 | v_{i}) = \frac{\exp (v_{i}^{'} α_{k})}{\sum_{j = 1}^{K} \exp (v_{i}^{'} α_{j})}, k = 1, \dots, K

(1)

where

v_{i}

is a vector of covariates determining class membership for subject

i

, and

α_{k}

is a vector of class membership regression parameters for class

k

, with the constraint that for the first latent class

α_{1} = 0 .

Suppose that the response score is obtained for the $i$ th subject, $i = 1, \dots, N,$ at time points ${t_{i}} .$ Without loss of generality, the subjects are assumed to have the same number of time points, which will be denoted by $T$ , that is, $t = 1, \dots, T$ for all $i$ .

Let $W_{i t}$ be the true, that is, error-free, response score for the $i$ th subject at time $t$ , which is related to three sets of exogenous covariates, $x_{i t}$ , $z_{i t}$ and $u_{i t}$ throughout the linear predictor $η_{i t}$ , namely

η_{i t} = β_{0} + x_{i t}^{'} β + z_{i t}^{'} γ_{i} + u_{i t}^{'} λ_{k_{i}}

(2)

where

x_{i t}

is an

M_{1}

-dimensional vector of covariates for the overall fixed effects,

z_{i t}

is an

M_{2}

-dimensional vector of covariates associated with the random effects and

u_{i t}

is an

M_{3}

-dimensional vector of covariates for the class-specific fixed effects. Moreover,

β_{0}

β

γ_{i}

and

λ_{k_{i}}

are regression coefficient vectors for the overall intercept, fixed effects, random effects and class-specific fixed effects for class

k_{i}

, respectively, with a constraint for the first parameter of the first class-specific fixed effect to identify the intercept, that is,

λ_{11} = 0

; and class

k_{i}

is the latent class to which subject

i

belongs,

k_{i} \in {1, 2, \dots, K}

. We highlight the difference between

k_{i}

and

k

in the model exposition below, as the latter represents a potential value from each of

1, \dots, K

as opposed to a specific class, which is the case for

k_{i} = {c_{i k} = 1}

3.2. Monotonic continuous process

Assume that the response scores follow a monotonically non-increasing continuous process, that is, $W_{i 1} \geq W_{i 2} \geq \dots \geq W_{i, T - 1} \geq W_{i T},$ for all $i = 1, \dots, N .$ Here, $W_{i t}$ represents the true gradual process, which could be difficult to score quantitatively and is thus unobservable. Instead, let $Y_{i t}^{*}$ be the recorded and prone to measurement error variable, which will be described in Section 3.3.

The response variable $W_{i t}$ is treated as a latent variable and relates to the linear predictor $η_{i t}$ through normal and truncated normal distributions as follows:

\begin{aligned} W_{i 1} & \sim N (η_{i 1}, τ^{2}), t = 1 \end{aligned}

(3)

\begin{aligned} W_{i t} | W_{i, t - 1} = w_{i, t - 1} & \sim N (η_{i t}, τ^{2}) I [W_{i t} \leq w_{i, t - 1}], t = 2, \dots, T \end{aligned}

(4)

For the time points,

t = 2, \dots, T,

due to the monotonic non-increasing continuous nature of the process,

W_{i, t - 1} \geq W_{i t},

using truncated normal distributions, where

I [A] = 1

A

is true, and

I [A] = 0

otherwise. By construction, it is implicitly assumed that the continuous process satisfies a first-order Markov chain property. We further note that the monotonically non-decreasing case is analogous to the above.

3.3. Measurement error

Let $Y_{i t}^{*}$ be the (observed) continuous response for the $i$ th subject at time $t$ measured with error, which may result in non-monotonic patterns. We assume that conditional on $W_{i t}$ , $Y_{i t}^{*}$ follows a normal distribution, that is

Y_{i t}^{*} | W_{i t} = w_{i t} \sim N (w_{i t}, σ_{k}^{2})

(5)

Equation (5) denotes a classical additive measurement error model,^9,8 that is,

Y_{i t}^{*} = W_{i t} + ε_{i t}

, where

ε_{i t} \sim N (0, σ_{k}^{2}),

and

ε_{i t}

is independent of

W_{i t}

. Note that different variance parameters

σ_{k}^{2} \neq σ_{k^{'}}^{2}

for latent classes

k \neq k^{'}

can be used to account for different degrees of error in each class. Furthermore, the variance

σ_{k}^{2}

for the latent class

k

is related to the measurement error, that is, if the data in latent class

k

does not show measurement error a

σ_{k}^{2} = 0

would be obtained, and by this way, the greater the measurement error for class

k

, the greater variance

σ_{k}^{2}

will be.

4. Bayesian analysis

The proposed full model is a LCLMM comprised by equations (1) to (5) in Section 3. The prior and posterior distributions, as well as the needed constraints are described in the following subsections.

4.1. Prior elicitation

Some components of the prior distribution have been chosen to be conditionally conjugate distributions. In general, normal prior distributions are used for the regression coefficients in the linear predictor, $η_{i t}$ . That is, $β_{0} \sim N (b_{0}, B_{0})$ , $β \sim N_{M_{1}} (b, B),$ $γ_{i} \sim N_{M_{2}} (c, Γ)$ and $λ_{k} \sim N_{M_{3}} (d, D)$ . Hyperparameters are respectively set to $b_{0} = 0$ , $b = 0$ , $c = 0$ and $d = 0$ for the mean parameters while diagonal matrices are used for the covariance matrices of the fixed effects’ regression coefficients, that is, $B_{0} = 10,$ $B = I_{M_{1}} \times 10$ and $D = I_{M_{3}} \times 10$ . In addition, to improve the identifiability on the class-specific fixed effects parameters, ordering constraints are imposed on the first class-specific fixed effect parameter, that is, $λ_{21} < \dots < λ_{K 1}$ , with $λ_{11} = 0$ .

On the other hand, in order to ensure the identifiability of the parameters in the covariance matrix of the random effects’ regression coefficients, $Γ$ , at least one variance must be set to a constant following earlier work.¹⁴ Thus, the positive-definite matrix $Γ$ is parameterized through its Cholesky decomposition, factorizing it as $Γ = L L^{'}$ , where $L$ is a lower triangular matrix with positive entries on the diagonal and unrestricted entries below the diagonal. Specifically, we set, $L [1, 1] = 1$ , $L [l, l] \sim Gamma (1, 1)$ for $l = 2, \dots, M_{2}$ , and $L [l_{1}, l_{2}] \sim N (0, 1)$ for $l_{1} = 2, \dots, M_{2}$ and $l_{2} = 1, \dots, l_{1} - 1$ . Lastly, $L [l_{1}, l_{2}] = 0$ for $l_{1} = 1, \dots, M_{2}$ and $l_{2} = l_{1} + 1, \dots, M_{2}$ . Note that setting $L [1, 1] = 1$ ensures that the $Γ [1, 1] = 1$ , facilitating parameter identifiability.

Similarly, normal prior distributions are used for the regression coefficients in the logistic model in each latent class, that is, $α_{k} \sim N_{Q} (a, A)$ . Specifically, we set $α_{k q} \sim N (0, 9 / 4),$ $q = 1, \dots, Q$ , which provides relatively flat prior distributions after log transformations,¹⁵ with the constraint $α_{1} = 0$ . Moreover, inverse Gamma prior distributions are used for the measurement error’s variance parameters; that is, $σ_{k}^{2} \sim IG (s_{σ}, r_{σ}),$ in particular $\frac{1}{σ_{k}^{2}} \sim Gamma (0.01, 0.01) .$

In finite mixture models, informative prior distributions should be considered to ensure that observations are assigned to each mixture component. Following a previously proposed hierarchical prior distribution,¹⁶ we set the variance of the latent variable to follow an inverse Gamma distribution, or equivalently $\frac{1}{τ^{2}} \sim Gamma (e_{τ}, κ_{τ}),$ with $κ_{τ} \sim Gamma (s_{τ}, r_{τ} / R_{y})$ . Specifically, $e_{τ} = 2$ , $s_{τ} = 0.5$ , $r_{τ} = 10$ and $R_{y}$ is the range (the difference between the maximum and minimum) of the observed responses.

4.2. Nonidentifiability problems and label switching

In a finite mixture model, there are three types of nonidentifiability problems that could arise¹⁷: (i) invariance of the likelihood under the relabelling of the components, a phenomenon called label switching; (ii) potential overfitting introduced when either one component is empty or two components are equal, which means that there are more components defined than actually needed; and (iii) another generic property, for example, when different parameters describe the same density. Under a Bayesian framework, the posterior distribution of finite mixture and hidden Markov models will inherit the likelihood invariance, affecting inference. It is worth noting that label switching in a well-separated arrangement of data might not occur, however for the problem at hand this setting is unlikely. Furthermore prediction could be masked amidst the self-supervised learning task of the chosen model as previously described.¹⁸

Although identifiability may be achieved by imposing artificial constraints on the parameter space, several authors have pointed out that this approach seldom provides a satisfactory solution. Furthermore, label switching is a prerequisite for Markov chain Monte Carlo (MCMC) convergence as earlier work summarizes.¹⁹ In order to deal with the label switching problem there are several relabelling proposals which often involve a post-process of the simulation output. Some use cluster like tools²⁰ based on a modal region, ideally chosen before the label switching occurs. Stephens²¹ proposed a decision theoretic approach which involves an optimization criterion that could depend on the starting point. Frühwirth-Schnatter²² used permutation samplers that in turn could be used to find suitable identifiability constraints. Papastamoulis and Iliopoulus²³ proposed the equivalent classes representatives (ECR) method, which finds a permutation that minimizes the simple matching distance (SMD) between a true allocation and the permutations. Therefore one must chose a pivot as the true allocations to compare others with making the process pivot-dependent. Rodríguez and Walker¹⁹ used the fact that observations that are often allocated together must remain similar, hence they used the observed data to inform the loss function within an ECR method. They obtain an initial pivot estimate using one modal region of the posterior distribution to calculate the mean and standard deviation of the allocated data, relaxing pivot dependence. Therefore, performance of each of the aforementioned algorithms will be affected in some cases by the initial point or pivot used in the optimization process or would lead to heavy computational burden. Hence, to select one that performs well requires a careful analysis of the particular problem.

We use R package label.switching, which includes eight relabelling methods.²⁴ These algorithms are used as a post process to relabel the latent classes in the fitted model. Some of these algorithms will require the allocation conditional probability for observation $i$ to component $k$ , expressed as

p_{i k} = \frac{π_{i k} \prod_{t = 1}^{T} f_{θ_{k}, ω} (y_{i t}^{*} | μ_{i t}, ς_{i t}^{2})}{\sum_{h = 1}^{K} π_{i h} \prod_{t = 1}^{T} f_{θ_{h}, ω} (y_{i t}^{*} | μ_{i t}, ς_{i t}^{2}),}

where

f_{θ_{k}, ω}

denotes the probability density function (pdf) of a Gaussian distribution indexed by parameters

θ_{k}

and

ω

with

μ_{i t} = E [Y_{i t}^{*} | θ_{k}, ω]

ς_{i t}^{2} = Var [Y_{i t}^{*} | θ_{k}, ω]

θ_{k} = (c_{i k} = 1, λ_{k_{i}}, σ_{k_{i}}^{2})

and

ω = (β_{0}, β, Γ, τ^{2})

. In addition, the complete likelihood of the model is written as

L (θ, ω; y^{*}, c) = \prod_{i = 1}^{n} π_{i k_{i}} \prod_{t = 1}^{T} f_{θ_{k_{i}}, ω} (y_{i t}^{*} | μ_{i t}, ς_{i t}^{2})

See details in the Supplemental Material (Section S2).

To ameliorate the identifiability issues we propose the following strategy. We have two types of variances associated to the linear predictor in the latent process, the covariance matrix for the random effects $Γ$ and the latent process variance $τ^{2}$ . We are familiar with the challenges a method faces to explore the partitions space when facing overlapping and non-overlapping classes. Therefore, after exploring different possibilities to restrict the aforementioned variances, the strategy that performed best for both scenarios consists of (i) fit an initial model without random effects, $γ_{i}$ , to identify the latent classes, (ii) once this is done we deal with label switching of the finite mixture model in a post process²⁴ and (iii) when the labels have been assigned we consider the full mixture model to estimate all parameters letting $τ^{2} = 1$ and $Γ [1, 1] = 1$ .

Note that by applying the first step of the proposed post-process, that is, omitting the random effects, we reduce the variability of the model by considering a marginal model for longitudinal data. We focus on this population-level approximation, instead of an individual-level model, to identify the latent classes employing the average estimates of the sub-populations. The second step consists of obtaining the unswitched latent class membership using a relabelling post-process. Finally, the proposed model is estimated by incorporating some constraints needed to identify parameters, see details in the Supplemental Material (Section S3).

4.3. Exploring posterior distributions

The likelihood function for the post-process, that is, after label switching has been addressed, given the label of latent classes can be defined by using conditional independence assumptions, considering the observed variables $y^{*}$ , the unobservable latent variables $w$ , and latent classes $c$ , and it is defined by

\begin{aligned} L (w, β_{0}, β, γ, Γ, λ, τ^{2}, σ^{2} | y^{*}, c) \\ = \prod_{i = 1}^{n} {[\prod_{t = 1}^{T} p (y_{i t}^{*} | w_{i t}, σ_{k_{i}}^{2})] p (w_{i 1} | λ_{k_{i}}, ω) [\prod_{t = 2}^{T} p (w_{i t} | w_{i, t - 1}, ω, λ_{k_{i}})]} \end{aligned}

(6)

where

p (ξ)

denotes the pdf of the distribution corresponding to

ξ

The joint posterior distribution of the latent variables $w$ , random effects $γ$ , and parameters $β_{0},$ $β,$ $γ,$ $Γ,$ $λ,$ $τ^{2},$ and $σ^{2}$ , is obtained by using the likelihood function (6) and the prior distributions defined in Section 4.1, and it is given by:

\begin{aligned} p (w, β_{0}, β, γ, Γ, λ, τ^{2}, σ^{2} | y^{*}, c) \\ \propto L (w, β_{0}, β, γ, Γ, λ, τ^{2}, σ^{2} | y^{*}, c) \times p (β_{0}) p (β) p (γ) p (Γ) p (λ) p (τ^{2}) p (σ^{2}) \end{aligned}

In order to estimate the model parameters, stochastic simulation methods are required, as for instance, the MCMC sampling. The algorithm has been implemented in the R language and JAGS, and the source code can be downloaded from the GitHub repository outlined in the Supplemental Material.

5. Results

In this section, we evaluate the performance of the proposed model using simulated data where there are latent classes with both non-increasing monotonic trajectories and measurement errors in the response variable. The utility is also demonstrated by fitting the model on the motivating data. Details are described below. The proposed model was fitted using JAGS 4.3.0 and its R package interface rjags v4-12.

5.1. Simulation example

5.1.1. Data generation

Performing a full-scale simulation study is difficult because of all the possible parameters and variations that data may have, and therefore it would become computationally and time demanding. In this section, we have done an experimental simulation considering a feasible number of scenarios varying key study design parameters (namely $N$ , $T$ and $K$ ), and repeating them 100 times to obtain average of their results. Of note, we refrained from exploring multiple values for the model parameters ( $θ, ω$ ) due to the considerable growth in the number of scenarios this exploration would entail. Nonetheless, the chosen model parameters represent a non-trivial separation between classes as illustrated below.

A sample of $N = {100, 200}$ subjects is simulated, observed at $T = {4, 8}$ time points, and for which there are three true latent classes. To classify subjects in one of the three latent classes, a multinomial logistic regression model is used with probabilities $(π_{i 1}, π_{i 2}, π_{i 3})$ in (1), where the variables related to the class membership $v_{i}$ are simulated as $v_{i 1} \sim Gamma (39.93, 1.34)$ and $v_{i 2} \sim Gamma (1.93, 0.08)$ with regression coefficients $α_{1} = (0, 0)^{'}$ , $α_{2} = (- 0.2, 0.2)^{'}$ and $α_{3} = (0.1, - 0.4)^{'}$ , respectively, for latent classes 1, 2 and 3.

To compute the linear predictor (2), the fixed effects variables $x_{i t}$ are simulated as $x_{i t 1} \sim Gamma (46.02, 0.77)$ and $x_{i t 2} \sim Bernoulli (0.53)$ with regression coefficients $β = (0.2, 0.5)^{'}$ ; intercept $β_{0} = 16$ ; the random effects variables $z_{i t}$ are defined as $z_{i t 1} = 1$ and $z_{i t 2} = t$ with simulated random coefficients as $γ_{i} \sim {Normal}_{2} (0, Γ)$ , $Γ = (\begin{array}{cc} 0.5 & 0 \\ 0 & 0.5 \end{array})$ ; for each latent class the class-specific fixed effects $u_{i t}$ are generated as $u_{i t 1} = 1$ , $u_{i t 2} = t$ , and $u_{i t 3} = t^{2}$ with regression coefficients $λ_{1} = (0, - 0.05, - 0.01)^{'}$ , $λ_{2} = (4.8, - 0.5, - 0.01)^{'}$ and $λ_{3} = (5.2, - 1.2, - 0.01)^{'}$ for latent classes 1, 2 and 3, respectively.

The latent variables $w_{i t}$ are then generated from normal distributions following (3) for $t = 1$ , and truncated normal distributions following (4) for $t = 2, \dots, T$ , with $τ^{2} = 0.7$ . In Figure 2 (left), we show the monotonic non-increasing responses $w_{i t}$ , for a simulated data set with $N = 200$ subjects and three true classes. Notice that the data has been simulated in such a way that the classes overlap. Moreover, it can be observed that the specifications are such that the monotonic latent process is preserved for each latent class.

Figure 2.

Simulated profiles of response variables for a dataset with $N = 200$ subjects and three true latent overlapping classes. The classes are identified with mean profiles (solid thick lines) computed with loess smoothing method and 95% confidence intervals (shaded areas). Left: non-increasing response variables $w_{i t}$ . Right: response variables subject to measurement error $y_{i t}^{*}$ .

Finally, to simulate the response variable subject to measurement error $y_{i t}^{*}$ normal distributions (5) are used, where $σ_{1}^{2} = 1.2$ , $σ_{2}^{2} = 0.1$ and $σ_{3}^{2} = 0.3$ for latent classes 1, 2 and 3, respectively. In Figure 2 (right), we show the responses subject to measurement error, $y_{i t}$ , for a simulated data set with $N = 200$ subjects and three true classes. Note that under these specifications, the variables $y_{i t}^{*}$ can exhibit increasing or decreasing patterns over time, without any specific restrictions.

Parameter and covariate values are fixed for all replicated data sets, and therefore, all replicates have the same linear predictor. What varies in each simulated sample is the latent variable subject to the monotonic constraint, which in turn also changes the observed response variable subject to measurement error.

5.1.2. Fitting evaluation

As described above, we simulated 100 data sets to show the empirical properties of the proposed model, models without and with monotonic constraint have been fitted varying the number of estimated classes, $K = {2, 3, 4}$ . We used high-performance multi-node computing clusters to expedite the numerical studies, each node was comprised of 32 or 40 cores each with $\sim$ 2.3 GHz of speed and 4 GB of memory.^25,26 Computation time varied depending on the simulation scenario ranging between 1 h (for $N = 100; T = 4; K = 2$ ) to 8.5 h ( $N = 200; T = 8; K = 4$ ) for a set of 20 replicates.

A total of 30,000 iterations were performed with 15,000 burn-in iterations and a thinning factor of 15, generating four chains by using the MCMC sampling algorithm. The potential scale reduction statistic (Rhat) and the number effective samples for each parameter (n.eff) have been calculated.^27,28 Under these specifications, the chains appear to have converged. As mentioned before, label switching was addressed using the R package label.switching. In particular, we focused on the ECR-1 method to match our findings in the motivating data.

Given our data simulation strategy, when $N = 200$ and $T = 8$ , on average $N_{1} = 100.4$ (SD= $15.4$ ) data points are classified in class 1, $N_{2} = 68.4$ (SD= $17.1$ ) are in class 2, and $N_{3} = 31.1$ (SD= $7.0$ ) are in class 3. Considering increasing and decreasing trajectories across an overall of 1400 (200 $\times$ 7) patterns on the simulated variable $y_{i t}^{*}$ (with measurement error), class 1 exhibits, on average, $189.1$ (SD= $50.5$ ) subjects with increasing patterns out of a total of 698, class 2 shows $125.3$ (SD= $29.6$ ) subjects out of a total of 475 and class 3 displays $59.0$ (SD= $15.0$ ) subjects with increasing patterns out of a total of 227.

In general, the number of latent classes is unknown, as such it can be fixed according to prior information (if available), or it can be selected based on information criteria for model assessment, like the ones described by Gelman et al.²⁹ We have used the R package loo,³⁰ which uses leave-one-out cross-validation (LOO) and the widely applicable information criterion (Watanabe-Akaike information criterion, WAIC) methods to compute the expected log pointwise predictive density (ELPD).

The information criteria based on WAIC and LOO (equal to $- 2 \times$ ELPD) were computed for each model and their values reported in Table 2. For example, for the case $N = 200$ and $T = 8$ , we compare the criteria values for different values of $K$ . Using the LCLMM, on average, the optimal choice is $K = 4$ , since the criteria for WAIC and LOO are the lowest. However, using the proposed model, considering the monotonic constraint and measurement error, we obtain that, on average, the optimal choice is $K = 3$ .

Table 2.
Simulation: Average number (SD) of the criteria using $K = {2, 3, 4}$ latent classes for 100 data sets across different values of $N$ and $T$ . WAIC and LOO values for LCLMM (without monotonic constraint) and the proposed model (with monotonic constraint).

WAIC LOO

$K$ $N$ $T$ LCLMM Proposal LCLMM Proposal

2 100 4 1301.1 (41.4) 3655.1 (2706.4) 1355.1 (38.8) 1991.7 (811.1)

3 100 4 1292.1 (44.0) 2984.3 (2456.9) 1336.9 (39.0) 1784.0 (734.3)

4 100 4 1270.7 (45.5) 2806.6 (2610.6) 1307.8 (42.2) 1722.5 (752.3)

2 100 8 2615.7 (77.3) 2372.2 (992.3) 2711.5 (70.4) 2367.9 (255.1)

3 100 8 2603.2 (70.8) 2558.8 (1915.0) 2698.6 (67.4) 2275.7 (338.9)

4 100 8 2590.9 (75.2) 3131.4 (3132.0) 2682.5 (72.8) 2350.9 (463.3)

2 200 4 2603.7 (67.3) 3705.6 (3039.7) 2712.1 (59.1) 2886.1 (920.3)

3 200 4 2591.8 (67.2) 3810.9 (2914.9) 2680.2 (60.1) 2850.2 (828.2)

4 200 4 2561.4 (62.7) 3405.9 (2294.7) 2627.1 (56.6) 2715.7 (656.5)

2 200 8 5244.5 (88.8) 4716.5 (765.1) 5443.2 (85.6) 4869.4 (401.5)

3 200 8 5221.8 (102.3) 4231.0 (970.8) 5414.2 (98.6) 4402.4 (392.0)

4 200 8 5156.8 (116.0) 4844.6 (2030.2) 5341.2 (115.4) 4447.7 (369.9)

			WAIC	LOO
2	100	4	1301.1 (41.4)	3655.1 (2706.4)	1355.1 (38.8)	1991.7 (811.1)
3	100	4	1292.1 (44.0)	2984.3 (2456.9)	1336.9 (39.0)	1784.0 (734.3)
4	100	4	1270.7 (45.5)	2806.6 (2610.6)	1307.8 (42.2)	1722.5 (752.3)
2	100	8	2615.7 (77.3)	2372.2 (992.3)	2711.5 (70.4)	2367.9 (255.1)
3	100	8	2603.2 (70.8)	2558.8 (1915.0)	2698.6 (67.4)	2275.7 (338.9)
4	100	8	2590.9 (75.2)	3131.4 (3132.0)	2682.5 (72.8)	2350.9 (463.3)
2	200	4	2603.7 (67.3)	3705.6 (3039.7)	2712.1 (59.1)	2886.1 (920.3)
3	200	4	2591.8 (67.2)	3810.9 (2914.9)	2680.2 (60.1)	2850.2 (828.2)
4	200	4	2561.4 (62.7)	3405.9 (2294.7)	2627.1 (56.6)	2715.7 (656.5)
2	200	8	5244.5 (88.8)	4716.5 (765.1)	5443.2 (85.6)	4869.4 (401.5)
3	200	8	5221.8 (102.3)	4231.0 (970.8)	5414.2 (98.6)	4402.4 (392.0)
4	200	8	5156.8 (116.0)	4844.6 (2030.2)	5341.2 (115.4)	4447.7 (369.9)

LOO: leave-one-out cross-validation; WAIC: Watanabe-Akaike information criterion; LCLMM: latent class linear mixed model.

To further illustrate the findings, we summarize the results by comparing the true classes with those estimated from the model when $K = 3$ . Parameter estimation criteria are also evaluated by comparing the true values of the parameters $θ$ with their estimates $\hat{θ}$ . Finally, the mean absolute error (MAE) is calculated as the mean (over 100 data sets) of the absolute difference between the true value and the posterior estimates of the parameter obtained in each data set.

Table 3 shows the confusion matrix comparing the number of true against estimated classes, for different values of $N$ and $T$ . We note that, when $N = 200$ and $T = 8$ , the proposed model correctly classified 145 (74.65 + 51.09 + 19.83) subjects out of a total of 200, on average, while the model without monotonic constrain and measurement error correctly classified 116 (66.81 + 41.31 + 8.37).

Table 3.

Simulation: average number (SD) of cases for true class and estimated class ( $K = 3$ ) for 100 data sets across different values of $N$ and $T$ .

$N = 100, T = 4$		Estimated classes
LCLMM		1	2	3
True	1	35.31 (6.48)	3.02 (4.18)	17.67 (7.15)
Classes	2	3.54 (2.95)	19.04 (3.84)	4.63 (4.64)
	3	4.71 (4.75)	6.58 (2.99)	5.50 (5.96)
Proposal		1	2	3
True	1	54.77 (4.00)	0.96 (1.19)	0.27 (0.94)
Classes	2	6.21 (4.71)	20.97 (4.98)	0.03 (0.17)
	3	14.50 (4.40)	1.29 (1.83)	1.00 (3.42)
$N = 100, T = 8$		Estimated classes
LCLMM		1	2	3
True	1	38.49 (5.92)	10.16 (4.22 )	7.35 (5.86)
Classes	2	6.77 (1.98)	18.97 (2.95)	1.47 (1.00)
	3	9.95 (2.46)	5.20 (1.74)	1.64 (2.07)
Proposal		1	2	3
True	1	41.99 (6.62)	0.98 (1.25)	13.03 ( 5.24)
Classes	2	1.36 (3.20)	20.35 (5.13)	5.50 (2.48)
	3	2.60 (1.73)	1.49 (1.81)	12.70 (2.68)
$N = 200, T = 4$		Estimated classes
LCLMM		1	2	3
True	1	64.65 (7.98)	3.57 (7.35)	41.13 (11.76)
Classes	2	5.36 (5.61)	41.55 (6.67)	15.85 (10.33)
	3	5.82 (6.20)	8.94 (4.55)	13.13 (8.49)
Proposal		1	2	3
True	1	107.07 (6.27)	1.79 (1.56)	0.49 (1.12)
Classes	2	12.51 (7.14)	50.09 (7.40)	0.16 (0.96)
	3	19.39 (8.75)	2.84 (2.64)	5.66 (8.61)
$N = 200, T = 8$		Estimated classes
LCLMM		1	2	3
True	1	66.81 (12.96)	8.37 (5.55)	25.25 (8.18)
Classes	2	21.81 (8.56)	41.31 (10.85)	5.32 (3.39)
	3	13.60 (4.97)	9.16 (2.85)	8.37 (2.91)
Proposal		1	2	3
True	1	74.65 (13.41)	2.03 (2.15)	23.75 (10.15)
Classes	2	1.94 (5.60)	51.09 (15.40)	15.41 (7.09)
	3	6.36 (3.32)	4.94 (4.41)	19.83 (6.53)

MCMJSW: medial minimum joint space width.

Table 4 shows the mean estimates and standard deviations of the estimated parameters, as well as the MAE comparing the true values with the estimates, for the case $N = 200, T = 8$ . Of note, the wording ‘1 constraint’ and ‘0 constraint’ represent the value each corresponding variable was constrained at. On average, the estimates under the proposed model show less bias compared to the model without monotonic constraint nor measurement error. However, on average, the standard deviations are greater likely due to the proposed model introducing additional variance parameters when addressing measurement error.

Table 4.

Simulation: average (SD) and percentiles (2.5%,97.5%) of the median posterior estimates and MAE for 100 data sets with $N = 200$ and $T = 8$ .

$N = 200, T = 8$		LCLMM			Proposal
Parameter	True	Average (SD)	Percentiles	MAE	Average (SD)	Percentiles	MAE
$β_{0}$	16	17.804 (1.700)	(14.337, 20.614)	2.111	15.689 (1.256)	(13.381, 17.887)	1.067
$β$	0.2	0.204 (0.025)	(0.160, 0.256)	0.021	0.207 (0.019)	(0.174, 0.247)	0.017
	0.5	0.454 (0.461)	(−0.494, 1.190)	0.379	0.523 (0.307)	(0.002, 1.084)	0.246
$Γ_{11}$	0.5	1 constraint			1 constraint
$Γ_{12} = Γ_{21}$	0	−0.160 (0.015)	(−0.193, −0.131)	0.160	0.029 (0.053)	(−0.095, 0.126)	0.049
$Γ_{22}$	0.5	0.084 (0.009)	(0.069, 0.107)	0.415	0.193 (0.044)	(0.109, 0.283)	0.306
$λ_{1}$	0	0 constraint			0 constraint
	−0.05	−0.880 (0.103)	(−1.079, −0.687)	0.830	0.367 (0.386)	(−0.331, 1.197)	0.462
	−0.01	0.020 (0.010)	(0.001, 0.038)	0.030	−0.004 (0.059)	(−0.107, 0.126)	0.044
$λ_{2}$	4.8	1.578 (0.667)	(0.269, 3.005)	3.221	3.478 (0.644)	(2.175, 4.635)	1.325
	−0.5	−0.168 (0.228)	(−0.593, 0.240)	0.344	0.426 (0.334)	(−0.168, 1.123)	0.933
	−0.01	−0.020 (0.017)	(−0.051, 0.011)	0.016	−0.074 (0.028)	(−0.139, −0.026)	0.064
$λ_{3}$	5.2	1.622 (0.675)	(0.300, 3.053)	3.577	3.602 (0.659)	(2.305, 4.844)	1.597
	−1.2	−2.267 (0.288)	(−2.787, −1.810)	1.070	−1.485 (0.338)	(−2.136, −1.025)	0.336
	−0.01	0.073 (0.017)	(0.034, 0.108)	0.083	0.006 (0.022)	(−0.036, 0.055)	0.021
$σ^{2}$	1.2				1.202 (0.095)	(1.003, 1.350)	0.073
	0.1	No error			0.099 (0.021)	(0.068, 0.131)	0.012
	0.3				0.625 (0.231)	(0.146, 1.017)	0.343
$τ^{2}$	0.7	1.147 (0.115)	(0.944, 1.385)	0.447	1 constraint

LCLMM: latent class linear mixed model; MAE: mean absolute error.

5.2. OA data

This section addresses the analysis of the motivating data, we fit the two the models outlined before, that is, the LCLMM without monotonic constraint and the proposed model including the constraint, to the OAI. Potential label switching was addressed as before using R package label.switching. After evaluating the available methods, we focus on the results from the first iterative version of equivalence classes representatives (ECR-1) algorithm, which uses the simulated allocation variables and is initialized by a pivot selected at random, as it has shown in this case to be less dependent on starting quantities as opposed to other approaches.^23,19

Age, biological sex, BMI and WOMAC total score were deemed as overall fixed effects ( $x$ ), a linear trajectory (i.e. intercept and slope) comprised the random effects ( $z$ ), and a quadratic trajectory (intercept, slope and quadratic term) were considered for the class-specific fixed effects ( $u$ ). On the other hand, age, sex and BMI values at baseline were included as class-specific covariates determining class membership ( $v$ ). As the number of classes is unknown, we evaluated models with $K = {2, \dots, 5}$ classes. For both models, a total of 30,000 iterations were performed with 15,000 burn-in iterations and a thinning factor of 15, generating five chains by using the MCMC sampling algorithm. The overall computation time across all values of $K$ was 11.5 h using a single node with 40 processors to perform the model estimation. The Rhat values for the proposed model ranged between 1.00 and 1.09, with larger values observed for $λ$ parameters. In comparison, all the Rhat values for the LCLMM were equal to 1.00.

The identified classes for both models across each value of $K$ are shown in Figure 3. Despite the dense set of trajectories, both models identified different trajectory classes, which appear all non-increasing as one would expect given the nature of the MCMJSW data. From Figure 3, it is noteworthy that without the monotonic constraint, as $K$ increases, almost all identified classes appear to be primarily determined by the baseline values subdividing the population in seemingly parallel patterns. In contrast, the model with monotonic constraint identifies distinct diverging trajectory patterns for increasing values of $K$ , suggesting the monotonic constraint aids in identifying more clinically relevant classes. This observation is further supported by the Sankey diagram that tracks the sample splitting as the number of classes increases in each of the models (Figure 4). Of note, when $K = 5$ , the proposed model determines only four classes leaving one group empty, once again aiding the class identification.

Figure 3.

Spaghetti plots for the subjects’ trajectories ( $n = 505$ ) in total (i.e. across both right and left knees) medial minimum joint space width (MCMJSW) measured in millimeters (mm). Coloured lines represent the identified trajectory clusters under the ECR-1 method across different values of $K$ for the latent class linear mixed model (LCLMM) without monotonic constraint (left) and with monotonic constraint (right). The numbers inside parenthesis represent the proportion of samples in each class.

Figure 4.

Sankey diagrams representing the classes splitting across values of $K$ ranging from 2 to 5 under the ECR-1 method for the LCLMM without monotonic constraint (left) and with monotonic constraint (right). LCLMM: latent class linear mixed model; ECR: equivalent classes representatives.

To further ascertain the most adequate number of classes, the WAIC, and LOO criteria were computed for each model using the loo package in R (Table 5). In theory, a model with the smallest information criterion value should be selected. Nonetheless, in practice, other factors such as parsimony and/or interpretability could also be factored in. Consequently, in the strict sense, among the evaluated number of classes $K = 5$ should be selected for both models. However, for the proposed model, we argue that $K = 3$ provides a better class resolution for three reasons: (1) it provides the second smallest information criteria values for both WAIC and LOO, (2) the solution with $K = 5$ leaves one empty class which is undesirable and (3) based on Figure 3, when $K = 5$ classes 4 and 5 denote analogous trajectory patterns. In contrast, the LCLMM without constraint reaches no consensus when $K = 5$ is discarded with each class solution identifying somewhat different groups. Moreover, the proposed model provides a better fit to the data as all its information criteria values are smaller compared to the corresponding LCLMM values irrespective of the choice of $K$ (Table 5).

Table 5.

WAIC and LOO values for LCLMM (without monotonic constraint) and proposed (with monotonic constraint) models when the number of classes ( $K$ ) ranges between $2$ and $5$ under the ECR-1 method.

	WAIC		LOO
$K$	LCLMM	Proposal	LCLMM	Proposal
2	9920.87	8199.82	10411.07	8614.59
3	9735.32	6801.50	10222.50	7328.03
4	9770.05	6885.16	10219.89	7345.95
5	9724.57	6672.77	10192.98	7117.77

LOO: leave-one-out cross-validation; WAIC: Watanabe-Akaike information criterion; LCLMM: latent class linear mixed model.

Therefore, we can identify three distinct trajectory classes under the proposed model: (1) a stable disease, (2) an initially stable disease with a sudden drop midway and (3) a rapidly deteriorating disease. The parameter estimates for this model are displayed in Table 6. Based on the 95% credible intervals, all fixed effects in the proposed model excepting BMI display non-null effects, which contrasts with the LCLMM where all factors display non-null effects. However, all mean values are comparable between models. Meanwhile, the variance components for the random effects showcase that the random intercept and slope are roughly uncorrelated in both models. The main discrepancy between the models lies in the class-specific fixed effects as the LCLMM exhibits null effects across all class 1 terms ( $λ_{1}$ ) and the class 3 slope ( $λ_{31}$ ), whereas all class-specific fixed effects in the proposal are non-null. Lastly, one of the class-specific variances ( $σ_{3}^{2}$ ) displays a considerably larger value, supporting the need to account for class-specific measurement errors.

Table 6.

Mean, standard deviation (SD) and 95% credible interval (CI) from the MCMC posterior samples for LCLMM (without monotonic constraint) and proposed (with monotonic constraint) when $K = 3$ .

		LCLMM			Proposal
Parameter	Variable	Mean	SD	95% CI	Mean	SD	95% CI
$β_{0}$	Intercept	13.205	0.452	(12.3, 14.1)	13.811	0.559	(12.6, 14.8)
$β$	Age	−0.067	0.006	(−0.0781, −0.0559)	−0.073	0.007	(−0.0855, −0.0595)
	Sex (ref. F)	0.705	0.101	(0.502, 0.903)	0.483	0.123	(0.244, 0.725)
	BMI	−0.027	0.009	(−0.0435, −0.00931)	−0.016	0.012	(−0.0402, 0.00878)
	WOMAC	−0.004	0.002	(−0.00653, −0.000481)	−0.021	0.003	(−0.028, −0.015)
$Γ$	$Γ_{11}$	1.0 constraint			1.0 constraint
	$Γ_{21} = Γ_{12}$	0.000	0.005	(−0.0107,0.0107)	−0.002	0.020	(−0.0404, 0.0377)
	$Γ_{22}$	0.027	0.003	(0.0224, 0.0327)	0.064	0.014	(0.0399, 0.0952)
$λ_{1}$	Intercept	0 constraint			0 constraint
	Slope	−0.056	0.031	(−0.115, 0.00349)	9.305	1.033	(7.56,11.6)
	Quadratic	−0.000	0.003	(−0.0065, 0.00664)	−1.066	0.131	(−1.35, −0.841)
$λ_{2}$	Intercept	−1.272	0.124	(−1.52, −1.03)	−1.015	0.142	(−1.29, −0.734)
	Slope	−0.523	0.033	(−0.588, −0.459)	0.333	0.068	(0.196, 0.468)
	Quadratic	0.008	0.004	(0.00149, 0.0154)	−0.084	0.009	(−0.101, −0.067)
$λ_{3}$	Intercept	0.380	0.159	(0.0733, 0.69)	0.852	0.213	(0.439, 1.27)
	Slope	−0.011	0.051	(−0.11, 0.0893)	1.724	0.316	(1.16, 2.44)
	Quadratic	−0.036	0.006	(−0.0474, −0.0254)	−0.244	0.039	(−0.33, −0.175)
$σ^{2}$	Class₁				0.207	0.009	(0.191, 0.224)
	Class₂	No error			0.190	0.014	(0.165, 0.219)
	Class₃				2.197	0.179	(1.88, 2.56)
$τ^{2}$		0.715	0.022	(0.674, 0.759)	1.0 constraint

LCLMM: latent class linear mixed model; BMI: body mass index; WOMAC: Western Ontario and McMaster Universities Arthritis Index; MCMC: Markov chain Monte Carlo.

6. Discussion

The proposed novel latent class linear mixed model accommodates both measurement error and monotonicity in a continuous process, which are challenging aspects in model-based clustering. In the motivating data, measurement error can be introduced due to variations in diagnostician, x-ray machine or knee positioning. Of note, the proposed model is not appropriate in all situations as JSW possesses a special characteristic that makes the truncation approach to tackle monotonicity adequate. In particular, JSW has a natural boundary at zero, that is, a negative JSW is physically impossible. Moreover, due to the chronic nature of OA, it is believed that the knee joint damage reflected by its spacing is irreversible (and thus follows a monotonic behaviour). Nonetheless, such a constraint can be used whenever there is a good rationale for sustained increasing/decreasing values in disease outcomes, which may occur in other chronicconditions.

We have shown in simulations that in the presence of an underlying monotonic process, the estimated parameters of the proposed model can be accurately recovered. Additionally, we note that increasing the sample size and number of timepoints improves the model selection when using WAIC and LOO. Moreover, the proposed Bayesian approach has appealing features regarding the uncertainty associated with the class probabilities, which can aid in assigning a particular subject into a potential treatment/intervention. For instance, low uncertainty in class may establish a plan of action whereas larger uncertainty may refrain practitioners from proposing a more definite intervention (or lack thereof) and suggest closer monitoring instead. Lastly, from the motivating data, it becomes apparent that introducing a monotonic constraint aids in identifying the number of classes in the model. Indeed, such constraint incorporates additional information on the response behaviour that ultimately support model evaluation.

Despite the aforementioned advantages, there are remaining challenges to be addressed. First, latent class models usually suffer from label switching due to the likelihood being maximized whether the classes were properly labelled or not. We were able to obtain a stable result after label switching, which is achieved through a post-processing of the class labels. We decided to use this post-process given that strategies that only introduced parameter constraints led to unsatisfactory chain mixing and lack of convergence. However, it is worth noting that ‘generic identifiability’ issues may remain due to the nature of mixture modelling as previously reviewed.³¹ Even though we focused on the ECR-1 algorithm in this work, discerning the most adequate label switching algorithm in more general settings introducing monotonic constraints deserves further investigation. Lastly, the computational time of model remains high especially for increasing number of classes ( $K$ ) and larger values of $n$ and $T$ . In this regard, variational Bayesian methods and faster MCMC algorithms, for example, Hamiltonian Monte Carlo, could be leveraged to reduce the computational complexity of the model.

The proposed model can be extended in several directions. First, as noted by one of the reviewers, the variance of latent variables $W$ , that is, $τ^{2}$ , is assumed to be constant over time. This assumption could be relaxed, in principle, to better reflect the behaviour of the analyzed data, for example, by introducing serial correlation. Second, another reviewer noted that, a perhaps less stringent, model could be formulated by inducing monotonicity through the linear predictors of the error-free latent variables $W_{i t}$ , that is, $η_{i, t} = η_{i, t - 1} + ξ_{i, t}$ with $ξ_{i, t}$ a non-negative variable, instead of the latent variables themselves. However, we anticipate additional challenges in identifiability and convergence as a result of incorporating either of these potential changes in the formulation.

In the motivating data, we focused on a subset of subjects with complete information. However, missing data commonly occur in longitudinal studies. In principle, the proposed LCLMM is robust to missing (completely) at random mechanisms. However, different considerations must be made under missing not at random (MNAR) mechanisms, which may influence the class probabilities.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802231225963 - Supplemental material for A latent class linear mixed model for monotonic continuous processes measured with error

Supplemental material, sj-pdf-1-smm-10.1177_09622802231225963 for A latent class linear mixed model for monotonic continuous processes measured with error by Osvaldo Espin-Garcia, Lizbeth Naranjo and Ruth Fuentes-García in Statistical Methods in Medical Research

Footnotes

Acknowledgements

Data and/or research tools used in the preparation of this manuscript were obtained and analyzed from the controlled access datasets distributed from the Osteoarthritis Initiative (OAI), a data repository housed within the NIMH Data Archive (NDA). OAI is a collaborative informatics system created by the National Institute of Mental Health and the National Institute of Arthritis, Musculoskeletal and Skin Diseases (NIAMS) to provide a worldwide resource to quicken the pace of biomarker identification, scientific investigation and OA drug development. This research was enabled in part by support provided by the SciNet and SHARCNET HPC Consortia and the Digital Research Alliance of Canada (). Computations were performed on the Niagara and Graham supercomputers. SciNet is funded by Innovation, Science and Economic Development Canada; the Digital Research Alliance of Canada; the Ontario Research Fund: Research Excellence; and the University of Toronto. The authors thank the two anonymous reviewers for their comments and suggestions that helped enriching this work.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Lizbeth Naranjo and Ruth Fuentes-García have been supported by UNAM-DGAPA-PAPIIT, Mexico (Project IN100823). Lizbeth Naranjo has been supported by UNAM-DGAPA-PASPA scholarship.

ORCID iDs

Osvaldo Espin-Garcia

Lizbeth Naranjo

Supplemental material

Supplemental material for this article is available online. Software in the form of R and JAGS code is available at . The supplemental material contains additional results and simulation studies.

References

Cui

Wang

, et al. Global, regional prevalence, incidence and risk factors of knee osteoarthritis in population-based studies. EClinicalMedicine 2020; 29-30: 100587.

Hawker

King

. The burden of osteoarthritis in older adults. Clin Geriatr Med 2022; 38: 181–192.

Muthén

Asparouhov

. Growth mixture modeling: Analysis with non-Gaussian random effects. In: Fitzmaurice G, Davidian M, Verbeke G et al. (eds) Longitudinal Data Analysis. Handbooks of Modern Statistical Methods. Boca Raton, Florida: Chapman & Hall/CRC, 2008, pp. 143–166.

Proust-Lima

Philipps

Liquet

. Estimation of extended mixed models using latent classes and latent processes: the R package lcmm. J Stat Softw 2017; 78: 1–56.

McCulloch

Searle

2001 Generalized, Linear, and Mixed Models. Wiley Series in Probability and Statistics. New York: John Wiley & Sons.

Kellgren

Lawrence

. Radiological assessment of osteo-arthrosis. Ann Rheum Dis 1957; 16: 494–502.

Kohn

Sassoon

Fernando

. Classifications in brief: Kellgren-Lawrence classification of osteoarthritis. Clin Orthop Relat Res 2016; 474: 1886–1893.

Carroll

Ruppert

Stefanski

, et al. Measurement Error in Nonlinear Models: A Modern Perspective. 2nd ed. Boca Raton, Florida: Chapman & Hall/CRC, 2006.

Buonaccorsi

. Measurement Error. London: Chapman & Hall/CRC, 2010.

10.

Naranjo

Pérez

Fuentes-García

, et al. A hidden Markov model addressing measurement errors in the response and replicated covariates for continuous nondecreasing processes. Biostatistics 2019; kxz004: 1–15.

11.

Naranjo

Lesaffre

Pérez

. A mixed hidden Markov model for multivariate monotone disease processes in the presence of measurement errors. Stat Modelling 2022; 22: 385–408.

12.

Lester

. The osteoarthritis initiative: a NIH public–private partnership. HSS J 2012; 8: 62–63.

13.

McConnell

Kolopack

Davis

. The Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC): a review of its utility and measurement properties. Arthritis Rheum 2001; 45: 453–461.

14.

Curtis

. BUGS code for item response theory. J Stat Softw, Code Snippet 2010; 36: 1–34.

15.

Garrett

Zeger

. Latent class model diagnosis. Biometrics 2000; 56: 1055–1067.

16.

Richarson

Green

. On Bayesian analysis of mixtures with an unknown number of components. J R Stat Soc 1997; 59: 731–792.

17.

Frühwirth-Schnatter

2006 Finite Mixture and Markov Switching Models. Springer Series in Statistics. New York, NY: Springer.

18.

Liu

Hsu

Pavikumar

, et al. Masked prediction: A parameter identifiability view. In: Koyejo S, Mohamed S, Agarwal A, et al. (eds) Advances in Neural Information Processing Systems. volume 35. New Orleans, LA: Curran Associates, Inc., 2022, pp. 21241–21254.

19.

Rodrıguez

Walker

. Label switching in bayesian mixture models: Deterministic relabeling strategies. J Comput Graph Stat 2014; 23: 25–45.

20.

Celeux

. Bayesian inference for mixtures: The label switching problem. In: Payne R and Green P (eds) COMPSTAT 98. Physica-Verlag, pp. 227–232.

21.

Stephens

. Dealing with label-switching in mixture models. Journal of the Royal Statistical Society, series B 2000; 62: 795–809.

22.

Frühwirth-Schnatter

. Markov chain Monte Carlo estimation of classical and dynamic switching and mixture models. J Am Statist Ass 2001; 96: 194–209.

23.

Papastamoulis

Iliopoulus

. An artificial allocations based solution to the label switching problem in bayesian analysis of mixtures of distributions. J Comput Graph Stat 2010; 19: 313–331.

24.

Papastamoulis

. label.switching: an R package for dealing with the label switching problem in MCMC outputs. J Stat Softw 2016; 69: 1–24.

25.

Ponce

van Zon

Northrup

, et al. Deploying a top-100 supercomputer for large parallel workloads: The niagara supercomputer. In Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), PEARC ’19, 28 July 2019. New York, NY, USA: Association for Computing Machinery.

26.

Loken

Gruner

Groer

, et al. Scinet: lessons learned from building a power-efficient top-20 system and data centre. Journal of Physics: Conference Series 2010; 256: 012026.

27.

Brooks

Gelman

. General methods for monitoring convergence of iterative simulations. J Comput Graph Stat 1998; 7: 434–455.

28.

Smith

. BOA: an R package for MCMC output convergence assessment and posterior inference. J Stat Softw 2007; 21: 1–37.

29.

Gelman

Hwang

Vehtari

. Understanding predictive information criteria for Bayesian models. Stat Comput 2014; 24: 997–1016.

30.

Vehtari

Gelman

Gabry

. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat Comput 2017; 27: 1413–1432.

31.

Sylvia Fruhwirth-Schnatter CPR Gilles Celeux (ed) Handbook of Mixture Analysis, chapter 12. Chapman & Hall/CRC, 2018. https://doi.org/10.1201/9780429055911.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.67 MB