Latent classification of time-dependent transition rates in longitudinal binary outcome data

Abstract

Continuous-time Markov chain (CTMC) models and latent classification methods are commonly used to analyze longitudinal categorical outcomes in medical research. While CTMC models are popular for their simplicity and effectiveness, their assumption of constant transition rates presents limitations in capturing dynamic behaviors. To address this, non-homogeneous continuous-time Markov chains (NH-CTMCs) have been developed, incorporating time-varying transition rates to enhance model flexibility. In this study, we leverage closed-form transition probabilities for a fully ergodic two-state NH-CTMC model and propose a latent class clustering approach to identify heterogeneous transition rate patterns within the population. We emphasize the potential advantages of these models in health sciences, particularly for longitudinal studies where transition rates vary over time and across subgroups. Additionally, we demonstrate the practical application of our model using data from an ambulatory hypertension monitoring study.

Keywords

Longitudinal categorical data continuous-time Markov chain time-dependent transition modeling latent class analysis heterogeneous state transitions

1. Introduction

A continuous-time Markov chain (CTMC) serves as a stochastic model that describes a sequence of longitudinal states with an assumption of the Markovian property, which assumes that the probability distribution of the next state depends solely on the current state, independent of past states.¹ Characterized by a state space and transition rates, CTMC facilitates the derivation of transition probabilities between observed outcomes. Markov chains find active applications across diverse fields, particularly in medical research, where they are used to analyze disease progression, treatment response, and other dynamic processes.^2–6

Despite their utility, Markov models often encounter limitations, particularly due to the assumption of constant transition rates over time. In medical applications, transition behaviors often evolve dynamically, influenced by factors such as aging, disease progression, or treatment effects.^7,8 To address this, several studies have explored methods for incorporating non-stationary transition rates.⁹ Notable examples include modeling Alzheimer’s disease progression using non-homogeneous Markov processes to incorporate a time-dependent approach;¹⁰ Chang et al. derived exact transition probabilities for non-homogeneous continuous-time Markov chain (NH-CTMC) models, while Ngan approximated parameter estimations using Uniform Acceleration asymptotic expansion;^11–13 other works attempted to define theoretical settings or establish probability approximation methods through simulations.^14–17

Another challenge in medical research is the presence of unobserved heterogeneity, where subjects exhibit different transition behaviors influenced by latent factors. Some variables may be unobserved and thus latent for various reasons, such as participants withholding information about their sexual behavior.¹⁸ Latent class analysis provides a powerful tool to account for such hidden structures, allowing researchers to cluster subjects with similar transition patterns.¹⁹ Applications include identifying patient subgroups from electronic health records,²⁰ modeling behavioral homophily in social networks,²¹ and classifying individuals based on clinical characteristics.²² In survival analysis and Markov modeling, latent classification helps differentiate subjects who may otherwise appear homogeneous. To demonstrate, Liang et al. proposed a model clustering subjects based on survival behavior, and Kuo applied latent classification to homogeneous CTMCs.^23,24

NH-CTMC models offer a valuable extension for health science research, as they capture time-dependent transition behaviors more effectively than standard CTMC models. Accounting for both time-varying rates and heterogeneous transition patterns allows for a more nuanced understanding of disease progression and patient trajectories.¹¹ In this study, we propose a method that integrates NH-CTMC modeling with latent classification to identify distinct transition patterns among subjects. By clustering individuals based on their transition behaviors, we provide a framework for personalized modeling in longitudinal health studies. Specifically, we developed a method to classify latent subgroups based on longitudinal binary outcomes and covariates. Our approach assumes that each binary sequence follows a time-dependent CTMC, with distinct classes characterized by unique transition rate parameters. We validated the proposed method through simulations and applied it to the Dietary Approaches to Stop Hypertension (DASH) study dataset.²⁵ Inferences and analyses were conducted to investigate blood pressure state transitions and the impact of covariates among ambulatory patients with hypertension.

2. Methods

2.1. Non-homogeneous continuous-time Markov chain

Prior to model formulation, we defined key terms and variables. The model involves data with $n$ subjects, each indexed by $j$ with $m_{j}$ repeated measurements taken at specific time points. Each subject $j$ has a set of covariates denoted by $x_{j} \in R^{d}$ . The state outcome of subject $j$ at time $t$ is represented by $Y_{j} (t)$ or simply $Y (t)$ when generalized. The possible states of $Y (t)$ belong to the set $S ≜ {1, \dots, ν}$ ; for our two-state model, we have $S ≜ {0, 1}$ .

The probability of transitioning from state $y_{1}$ at time $s$ to state $y_{2}$ at time $t$ is defined as the following:

P_{y_{1} y_{2}} (s, t; x) ≜ P (Y (t) = y_{2} | Y (s) = y_{1}; x),

where

y_{1}, y_{2} \in S

x \in R^{d}

, and

0 \leq s \leq t

. The time-dependent transition rate matrix is denoted as

Q (t; x) \in R^{2 \times 2}

. We employed time-varying log-logistic transition rates:

λ (t; x) ≜ \frac{a t^{a - 1}}{1 + t^{a}} \cdot g (x),

where

a

controls the baseline transition rate, and

g (x)

modulates the effect of covariates. While Chang et al.¹¹ used an exponential covariate effect

e^{x^{⊤} β}

, we applied a positive linear effect to improve computational stability and interpretability:

g (x) ≜ \log_{2} (1 + 2^{x^{⊤} β}),

where

x

is the covariate vector and

β

is the vector of corresponding coefficients. The exponential form ensures positivity and monotonicity but can grow excessively steep, making numerical optimization unstable and potentially inflating rate estimates. The

\log_{2}

form retains the desired positivity and monotonicity while moderating the growth of the covariate effect, resulting in a more controlled and stable estimation process. Additionally, similar to the exponential effect, the

\log_{2}

effect yields an effect value of 1 when the covariates are zero, thereby preserving interpretability in terms of the baseline transition rate.

The NH-CTMC framework developed by Chang et al.¹¹ served as the basis for our model, with the following modified transition rates:

{\begin{cases} λ_{01} (t; x) ≜ \frac{a t^{a - 1}}{1 + t^{a}} \cdot \log_{2} (1 + 2^{x^{⊤} β_{01}}), \\ λ_{10} (t; x) ≜ \frac{b t^{b - 1}}{1 + t^{b}} \cdot \log_{2} (1 + 2^{x^{⊤} β_{10}}) . \end{cases}

Figure 1 from Chang et al.¹¹ demonstrates plots of log-logistic rate functions with different parameters and binary univariate covariate

x

. The corresponding infinitesimal matrix

Q (t; x) \in R^{2 \times 2}

is the following:

Q (t; x) ≜ (\begin{matrix} - λ_{01} (t; x) & λ_{01} (t; x) \\ λ_{10} (t; x) & - λ_{10} (t; x) \end{matrix}),

where

a

and

b

are shape parameters for the rates

λ_{01} (t; x)

and

λ_{10} (t; x)

, respectively, and

β_{01}

β_{10} \in R^{d}

are the covariate coefficients for each transition. The transition probabilities are the following:

{\begin{cases} P_{00} (s, t; x) = \frac{\log_{2} (1 + 2^{x^{⊤} β_{10}}) \int_{1 + s^{b}}^{1 + t^{b}} {(1 + (u - 1)^{a / b})}^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} u^{\log_{2} (1 + 2^{x^{⊤} β_{10}}) - 1} d u + (1 + s^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + s^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}{(1 + t^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + t^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}, \\ P_{01} (s, t; x) = \frac{\log_{2} (1 + 2^{x^{⊤} β_{01}}) \int_{1 + s^{a}}^{1 + t^{a}} {(1 + (v - 1)^{b / a})}^{\log_{2} (1 + 2^{x^{⊤} β_{10}})} v^{\log_{2} (1 + 2^{x^{⊤} β_{01}}) - 1} d v}{(1 + t^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + t^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}, \\ P_{10} (s, t; x) = \frac{\log_{2} (1 + 2^{x^{⊤} β_{10}}) \int_{1 + s^{b}}^{1 + t^{b}} {(1 + (u - 1)^{a / b})}^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} u^{\log_{2} (1 + 2^{x^{⊤} β_{10}}) - 1} d u}{(1 + t^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + t^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}, \\ P_{11} (s, t; x) = \frac{\log_{2} (1 + 2^{x^{⊤} β_{01}}) \int_{1 + s^{a}}^{1 + t^{a}} {(1 + (v - 1)^{b / a})}^{\log_{2} (1 + 2^{x^{⊤} β_{10}})} v^{\log_{2} (1 + 2^{x^{⊤} β_{01}}) - 1} d v + (1 + s^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + s^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}{(1 + t^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + t^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}, \end{cases}

(1)

with time points

0 \leq s \leq t

. To improve computational efficiency associated with the integral terms, we applied Simpson’s

1 / 3

rule,²⁶ resulting in the following approximated transition probabilities:

\begin{aligned} Normalize: & {\begin{cases} P_{00} (s, t; x) \approx \frac{\log_{2} (1 + 2^{x^{⊤} β_{10}}) \cdot \frac{t^{b} - s^{b}}{6} (f_{U} (1 + s^{b}) + f_{U} (1 + t^{b}) + 4 f_{U} (1 + \frac{s^{b} + t^{b}}{2})) + (1 + s^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + s^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}{(1 + t^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + t^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}, \\ P_{01} (s, t; x) \approx \frac{\log_{2} (1 + 2^{x^{⊤} β_{01}}) \cdot \frac{t^{a} - s^{a}}{6} (f_{V} (1 + s^{a}) + f_{V} (1 + t^{a}) + 4 f_{V} (1 + \frac{s^{a} + t^{a}}{2}))}{(1 + t^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + t^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}, \end{cases} \\ Normalize: & {\begin{cases} P_{10} (s, t; x) \approx \frac{\log_{2} (1 + 2^{x^{⊤} β_{10}}) \cdot \frac{t^{b} - s^{b}}{6} (f_{U} (1 + s^{b}) + f_{U} (1 + t^{b}) + 4 f_{U} (1 + \frac{s^{b} + t^{b}}{2}))}{(1 + t^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + t^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}, \\ P_{11} (s, t; x) \approx \frac{\log_{2} (1 + 2^{x^{⊤} β_{01}}) \cdot \frac{t^{a} - s^{a}}{6} (f_{V} (1 + s^{a}) + f_{V} (1 + t^{a}) + 4 f_{V} (1 + \frac{s^{a} + t^{a}}{2})) + (1 + s^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + s^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}{(1 + t^{a})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot (1 + t^{b})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})}}, \end{cases} \end{aligned}

(2)

where

f_{U} (u) ≜ (1 + (u - 1)^{a / b})^{\log_{2} (1 + 2^{x^{⊤} β_{01}})} \cdot u^{\log_{2} (1 + 2^{x^{⊤} β_{10}}) - 1}

and

f_{V} (v) ≜ (1 + (v - 1)^{b / a})^{\log_{2} (1 + 2^{x^{⊤} β_{10}})} \cdot v^{\log_{2} (1 + 2^{x^{⊤} β_{01}}) - 1}

. The transition probabilities originating from the same state are normalized to ensure they sum to 1. Further derivations with Chapman–Kolmogorov equations can be found in Appendix A.1 of Chang et al.¹¹

Figure 1.

Two-state non-homogeneous continuous-time Markov chains with $K = 3$ latent classes.

2.2. Maximum likelihood estimation

The likelihood function for the observed outcome data must be formulated to estimate the parameters. Assuming we have $K$ latent classes with different sets of parameters, that is, $θ_{k}$ for the $k$ th latent chain for each $k = 1, \dots, K$ as in Figure 1, the likelihood for the $k$ th latent Markov chain of the $j$ th subject is the following:

L_{j k} (θ_{k}) ≜ \prod_{i = 0}^{m_{j} - 1} P_{y_{i}^{(j)}, y_{i + 1}^{(j)}} (t_{i}^{(j)}, t_{i + 1}^{(j)}; x_{j} | θ_{k}),

(3)

where

y_{i}^{(j)}, y_{i + 1}^{(j)} \in S

are realizations of

Y_{i}^{(j)}

and

Y_{i + 1}^{(j)}

, respectively, and

P_{y_{i}^{(j)}, y_{i + 1}^{(j)}} (t_{i}^{(j)}, t_{i + 1}^{(j)}; x_{j} | θ_{k})

defines the transition probability from state

y_{i}

at time

t_{i}^{(j)}

to state

y_{i + 1}

at time

t_{i + 1}^{(j)}

, with time points

0 \leq t_{i}^{(j)} \leq t_{i + 1}^{(j)}

and covariate vector

x_{j} \in R^{d}

m_{j}

represents the number of observations for each

j

th subject, and

θ_{k}

stands for the class-specific vector of rate parameters, that is

θ_{k} ≜ {a^{(k)}, b^{(k)}, β_{01}^{(k)}, β_{10}^{(k)}}

, in the model.

To extend the NH-CTMC framework to incorporate latent classes with class-specific transition rates, let $π_{k}$ denote the probability that a subject belongs to the $k$ th latent class, where $π_{k} \in (0, 1)$ for all $k \in 1, \dots, K$ and $\sum_{k = 1}^{K} π_{k} = 1$ . The full log-likelihood is then given by the following:

ℓ (θ, π) ≜ \sum_{j = 1}^{n} \log (\sum_{k = 1}^{K} π_{k} \prod_{i = 0}^{m_{j} - 1} P_{y_{i}^{(j)}, y_{i + 1}^{(j)}} (t_{i}^{(j)}, t_{i + 1}^{(j)}; x_{j} | θ_{k})) .

(4)

2.2.1. Optimization procedure

Computation of the maximum likelihood estimator (MLE) is commonly performed using the mle2 function from the R package bbmle.²⁷ The package utilizes the derivative-free Nelder-Mead optimization method,²⁸ but due to its heavy computational burden, it often experiences prolonged convergence times. To address this issue, we utilized the R package optimParallel,²⁹ which facilitates the use of the limited-memory BFGS Bound (L-BFGS-B) optimizer—a quasi-Newton method for box-constrained optimization problems—with parallel processing.^30–32 The Nelder-Mead optimizer was implemented as an alternative when the L-BFGS-B method did not achieve convergence.

The log-likelihood of Equation (4) served as the objective function for computing MLE of $θ$ and $π$ . The corresponding derivative of the function was approximated by the optimParallel() command, with constraints of $a > 0$ and $b > 0$ for all latent classes. Standard errors (SEs) were estimated based on the Hessian matrix approximated by optimParallel. According to large-sample theory, the distribution of the MLE converges to a normal distribution with the covariance matrix computed by the inverse of the observed Fisher information matrix, or equivalently the inverse of the negative of the Hessian matrix.^33,34 Consequently, SEs of parameters are simply square-rooted values of the diagonal elements of the inverted matrix.

Another computational efficiency issue is on the choice of initial value for the optimization procedure. Due to the presence of multiple latent classes, the log-likelihood of Equation (4) could exhibit several local maxima and saddle points, posing a risk of the optimization algorithm failing to converge to the desired global maximum of the objective log-likelihood function. The convergence results could be sensitive to the starting values, and hence obtaining initial values close to the true MLE values may drastically improve the convergence and optimality. We implemented Algorithm 1 to find suitable initial values, setting the upper limit for rate estimates to $κ = 7$ and initializing the covariate coefficients to ${\hat{β}}_{k}^{(0)} = \vec{0}$ for all $k$ . Building on the approach by Kuo et al.,²⁴ we used $K$ -means clustering³⁵ to estimate the initial values of ${\hat{π}}_{k}^{(0)}$ and ${\hat{θ}}_{k}^{(0)}$ for each latent class $k$ , based on the assumption of a homogeneous-rate model, which provided an approximate but distinct starting point for the optimization iteration. The initial estimates of ${\hat{θ}}_{k}^{(0)}$ were obtained using a heuristic approach motivated by the homogeneous-rate CTMC framework, where transition rates were estimated as the average of inverse log-time differences between observed state transitions. The initial cluster probabilities ${\hat{π}}_{k}^{(0)}$ were set according to the relative sizes of the cluster sets $C_{k}$ identified by the $K$ -means algorithm.

Additionally, to further reduce the computation time in the optimization procedures, we also removed some constraints by transforming variables into a more amenable form. The latent class probabilities $π$ are constrained to be positive values, less than 1, that is $π \in (0, 1)$ , and must sum up to 1, that is $π_{1} + \dots + π_{K} = 1$ . By replacing $π$ with the following sigmoid reparametrization:

π_{k} ≜ \frac{e^{ρ_{k}}}{1 + \sum_{i = 1}^{K - 1} e^{ρ_{i}}}, for k = 1, \dots, K - 1,

(5)

we are able to ensure that the

π

variables take values within 0 and 1. To ensure the probabilities summing to 1, we formulate the last

K

th cluster probability

π_{K}

as the following:

π_{K} ≜ \frac{1}{1 + \sum_{i = 1}^{K - 1} e^{ρ_{i}}} .

(6)

The optimization procedure is then based on the unconstrained

ρ_{j}

variables, where

j = 1, \dots, K - 1

. When cluster probabilities depend on covariates, the transformation can also incorporate

x

by adding

x^{⊤} β

ρ

. SEs of

ρ

’s were estimated using the delta method, with further details provided in Appendix A.

2.3. Latent class prediction at individual level

The maximum likelihood estimates $\hat{θ}$ and $\hat{π}$ were used to infer the latent class assignment of each individual. Specifically, we applied Bayes’ rule to compute the model-based predictive probability of class assignment based on the MLEs:

{\hat{p}}_{j, k} ≜ P (Z_{j} = k ∣ {Y_{j} (t), x_{j}}; \hat{θ}) = \frac{P ({Y_{j} (t), x_{j}}, Z_{j} = k; {\hat{θ}}_{k})}{\sum_{l = 1}^{K} P ({Y_{j} (t), x_{j}}, Z_{j} = l; {\hat{θ}}_{l})},

(7)

where

{\hat{p}}_{j, k}

denotes the estimated probability of subject

j

belonging to latent class

k

, and

Z_{j}

represents the random variable for the latent class of subject

j

. The numerator represents the likelihood of the

j

th subject with the parameters estimated for latent class

k

, and the denominator represents the overall likelihood for subject

j

. The individual likelihood for each latent class

k

is expressed and computed as the following:

P ({Y_{j} (t), x_{j}}, Z_{j} = k; {\hat{θ}}_{k}) = P (Z_{j} = k) \cdot P ({Y_{j} (t), x_{j}} ∣ Z_{j} = k; {\hat{θ}}_{k}) \equiv {\hat{π}}_{k} L_{j k} ({\hat{θ}}_{k}),

(8)

for

k = 1, \dots, K

. The optimal class assignment for each subject is the class with the maximum probability:

{\hat{z}}_{j} ≜ \underset{k \in {1, \dots, K}}{arg max} {\hat{p}}_{j, k},

(9)

where

z_{j}

represents the realization of the latent class

Z_{j}

, and

{\hat{z}}_{j}

denotes the estimated value of the unobserved

z_{j}

3. Simulation study

3.1. Overview

We conducted a simulation study to evaluate the performance of the proposed method under controlled conditions. We generated 1000 datasets each for sample sizes $n = 250$ , $500$ , and $1,000$ , totaling 3000 simulated datasets, over a fixed observation window of $T = 20$ and $K = 3$ latent classes. For each dataset, we estimated model parameters via maximum likelihood and assessed the estimation accuracy by examining bias, SEs, and coverage probabilities (CPs). We further evaluated model selection performance using the generalized information criterion (GIC) and compared latent class prediction accuracy against several competing longitudinal clustering approaches.

3.2. Algorithms and propositions

The simulation algorithm for the two-state NH-CTMC model, proposed by Chang et al.,¹¹ involves a thinning process of a non-homogeneous Poisson process^36,37 and sequentially selecting the first event as the transition event. In this paper, we present an alternative simulation algorithm, detailed in Algorithm 2. The algorithm simulates exponential random variables $V_{j}$ for each transition to state $j \in S ∖ {i}$ to determine transition event times by solving the integral of the transition rate, as described in line 5 of Algorithm 2. For a general $ν$ -state NH-CTMC model, the next state is determined by simulating a transition time for each possible destination state $k$ and selecting the state with the minimum transition time, denoted $T_{k}$ . The following propositions validate the algorithm and the distribution of transition time. Proofs of these propositions are provided in Appendix B.

Proposition 1
Let $V$ be a random variable following exponential distribution with unit mean, that is $V \sim \exp (1)$ , and $T$ be the solution of the equation:
$\int_{s}^{T} λ (u) d u = V,$
where $s \geq 0$ is a fixed constant. Then, $T$ follows the distribution with the following probability density function (PDF):
$f (t) = λ (t) \exp {- \int_{s}^{t} λ (u) d u} .$

Proposition 2
Suppose a subject is in state $i$ at time $t_{i}$ . Based on Proposition 1, for $j \in S ∖ {i}$ or simply for all $j \neq i$ , $T_{j}$ are independent random variables following PDF of the following:
$f_{j} (t) = λ_{i j} (t) \exp {- \int_{t_{i}}^{t} λ_{i j} (u) d u},$
where $λ_{i j} (t)$ is the transition rate function from state $i$ to $j$ , bounded and integrable for all $j \neq i$ . Then, we have following properties for $T ≜ min_{j \neq i} {T_{j}}$ :
$T$ follows the distribution with the following PDF:
$f_{T} (t) = (\sum_{j \neq i} λ_{i j} (t)) \exp {- \int_{t_{i}}^{t} \sum_{k \neq i} λ_{i k} (u) d u} I_{(0, \infty)} (t), and$

$f_{T} (t | min_{j \neq i} {T_{j}} = t) = \frac{λ_{i k} (t)}{\sum_{j \neq i} λ_{i j} (t)}$ .

3.3. General information criterion

To determine the optimal number of clusters for each simulated dataset, we adapted the GIC,^38,39 a generalized extension of the Akaike information criterion (AIC).^40,41 The GIC is defined as the following:

GIC = - 2 ℓ (θ, π) + η p,

where

p

denotes the number of model parameters and

η

represents the penalty weight associated with

p

, where

η = 2

in the case of AIC. In the latent NH-CTMC model, the number of parameters

p

is calculated as the following:

p = (2 d + 3) K - 1,

where the expression accounts for the total number of rate parameters

a

’s and

b

’s, covariate parameters

β

’s, and probability parameters

π

’s. For both the simulation and data application studies, we utilized GIC with

η = 10

to impose a stronger penalty on model complexity and to discourage rapid parameter growth as the number of latent classes increases. Figure 2 illustrates the mean GIC values for latent NH-CTMC models with

K = 1

5

latent classes across simulations at each sample size. The model with

K = 3

latent classes consistently yielded the optimal GIC in a majority of runs: 768 out of 1000 for

n = 250

, 997 out of 1000 for

n = 500

, and all 1000 for

n = 1,000

, indicating the increasing stability of model selection as sample size grows.

Figure 2.

Generalized Information Criterion (GIC; $η = 10$ ) values for latent NH-CTMC models with 1 to 5 latent classes computed from simulated datasets, with each panel corresponding to $n = 250$ , $500$ , and $1,000$ samples, respectively. The 3-class model achieved the optimal (minimum) GIC in 768 out of 1000 runs for $n = 250$ , 997 out of 1000 runs for $n = 500$ , and all 1000 runs for $n = 1,000$ .

3.4. Simulation results

Simulation results for a single non-latent NH-CTMC model have been previously presented by Chang et al.¹¹ Therefore, in this study, we primarily focus on the results obtained from the latent model simulations. A univariate covariate $x_{j}$ for each subject $j$ was independently and identically simulated from a normal distribution with a mean of $0$ and a standard deviation (SD) of $0.3$ . Using Algorithm 2 with prespecified parameter values, data were simulated and collected at discrete time points $t = 0, 1, \dots, T$ . Log-likelihoods were computed based on the discrete observations, and MLEs were estimated to compare the accuracy of the estimates to the true parameter values. Simulations and computations were executed on the servers of the Texas Advanced Computing Center at the University of Texas at Austin and Louisiana Optical Network Infrastructure High Performance Computing at Louisiana State University. Table 1 presents simulation results from 1000 runs for each sample size under the latent two-state NH-CTMC model with $K = 3$ latent classes. The CPs of parameter estimations mostly fell within the 95% range with minimal biases, with bias percentages under 3% for most parameters. The SD of the maximum likelihood estimates closely matched the mean of the estimated SEs across the simulations, suggesting that the SE estimates are approximately unbiased. In addition, as the sample size increased, we observed improved performance, including higher and more stable CPs, reduced bias, and smaller SDs and SEs that more closely aligned with one another.

Table 1.
Simulation results of the latent NH-CTMC model based on 1000 simulation runs for each sample size $n = 250$ , $500$ , and $1,000$ (3000 runs in total), with time up to $T = 20$ and $K = 3$ latent classes, listing means of estimates, bias, PB, SD, square root of mean of squared estimated SE, and empirical CP of 95% confidence intervals.

Parameter Value $n$ MLE Bias PB SD SE CP

$a^{(1)}$ 9.0 250 8.8779 −0.1221 1.36% 1.3642 0.9988 91.7%

500 8.7328 −0.2672 2.97% 0.6433 0.6585 90.9%

1000 8.6511 −0.3489 3.88% 0.4366 0.4469 83.9%

$b^{(1)}$ 5.0 250 5.1144 0.1144 2.29% 0.7689 0.6426 91.2%

500 4.9993 −0.0007 0.01% 0.4243 0.4188 93.8%

1000 4.9318 −0.0682 1.36% 0.2819 0.2828 93.0%

$β_{01}^{(1)}$ 5.0 250 4.9083 −0.0917 1.83% 0.9267 0.7988 90.4%

500 4.9262 −0.0738 1.48% 0.6065 0.5552 92.4%

1000 4.9173 −0.0827 1.65% 0.3812 0.3894 93.1%

$β_{10}^{(1)}$ 3.0 250 2.6035 −0.3965 13.22% 0.8160 0.6747 86.0%

500 2.6665 −0.3335 11.12% 0.5233 0.4667 84.3%

1000 2.6644 −0.3356 11.19% 0.3212 0.3237 80.7%

$a^{(2)}$ 5.0 250 5.0194 0.0194 0.39% 0.7284 0.6897 92.0%

500 5.0435 0.0435 0.87% 0.4670 0.4630 95.4%

1000 5.0425 0.0425 0.85% 0.3260 0.3174 93.6%

$b^{(2)}$ 10.0 250 10.2003 0.2003 2.00% 1.7499 1.5801 92.4%

500 10.2058 0.2058 2.06% 1.0461 1.0438 95.5%

1000 10.1488 0.1488 1.49% 0.7062 0.7124 96.4%

$β_{01}^{(2)}$ −1.5 250 −1.3700 0.1300 8.66% 1.0214 0.8647 90.5%

500 −1.5009 −0.0009 0.06% 0.6008 0.5727 94.8%

1000 −1.5203 −0.0203 1.35% 0.3890 0.3913 95.1%

$β_{10}^{(2)}$ 1.0 250 1.2480 0.2480 24.80% 1.1376 0.8668 89.8%

500 1.0351 0.0351 3.51% 0.6094 0.5584 95.7%

1000 0.9963 −0.0037 0.37% 0.3861 0.3786 94.5%

$a^{(3)}$ 2.0 250 2.1646 0.1646 8.23% 0.5485 0.4615 91.7%

500 2.0359 0.0359 1.79% 0.2999 0.2943 94.2%

1000 1.9983 −0.0017 0.09% 0.2041 0.1999 93.6%

$b^{(3)}$ 1.0 250 1.0285 0.0285 2.85% 0.2424 0.2152 93.7%

500 0.9973 −0.0027 0.27% 0.1450 0.1424 94.1%

1000 0.9842 −0.0158 1.58% 0.0996 0.0978 93.2%

$β_{01}^{(3)}$ 1.5 250 1.3806 −0.1194 7.96% 1.6089 1.4288 90.6%

500 1.4975 −0.0025 0.17% 0.9960 0.9801 93.6%

1000 1.5358 0.0358 2.39% 0.6744 0.6826 94.6%

$β_{10}^{(3)}$ −2.0 250 −2.1647 −0.1647 8.23% 1.7032 1.5859 92.0%

500 −2.0433 −0.0433 2.16% 1.0826 1.0441 93.3%

1000 −1.9855 0.0145 0.73% 0.7068 0.7152 95.2%

$π_{1}$ 0.5 250 0.5012 0.0012 0.24% 0.0508 0.0571 97.0%

500 0.5058 0.0058 1.15% 0.0328 0.0395 97.5%

1000 0.5070 0.0070 1.39% 0.0224 0.0278 97.1%

$π_{2}$ 0.3 250 0.2936 −0.0064 2.14% 0.0373 0.0452 97.2%

500 0.2935 −0.0065 2.18% 0.0247 0.0317 97.4%

1000 0.2950 −0.0050 1.68% 0.0170 0.0223 98.4%

$π_{3}$ 0.2 250 0.2053 0.0053 2.63% 0.0444 0.0484 96.1%

500 0.2008 0.0008 0.39% 0.0275 0.0330 98.0%

1000 0.1981 −0.0019 0.96% 0.0196 0.0230 96.9%

Parameter	Value	$n$	MLE	Bias	PB	SD	SE	CP
$a^{(1)}$	9.0	250	8.8779	−0.1221	1.36%	1.3642	0.9988	91.7%
		500	8.7328	−0.2672	2.97%	0.6433	0.6585	90.9%
		1000	8.6511	−0.3489	3.88%	0.4366	0.4469	83.9%
$b^{(1)}$	5.0	250	5.1144	0.1144	2.29%	0.7689	0.6426	91.2%
		500	4.9993	−0.0007	0.01%	0.4243	0.4188	93.8%
		1000	4.9318	−0.0682	1.36%	0.2819	0.2828	93.0%
$β_{01}^{(1)}$	5.0	250	4.9083	−0.0917	1.83%	0.9267	0.7988	90.4%
		500	4.9262	−0.0738	1.48%	0.6065	0.5552	92.4%
		1000	4.9173	−0.0827	1.65%	0.3812	0.3894	93.1%
$β_{10}^{(1)}$	3.0	250	2.6035	−0.3965	13.22%	0.8160	0.6747	86.0%
		500	2.6665	−0.3335	11.12%	0.5233	0.4667	84.3%
		1000	2.6644	−0.3356	11.19%	0.3212	0.3237	80.7%
$a^{(2)}$	5.0	250	5.0194	0.0194	0.39%	0.7284	0.6897	92.0%
		500	5.0435	0.0435	0.87%	0.4670	0.4630	95.4%
		1000	5.0425	0.0425	0.85%	0.3260	0.3174	93.6%
$b^{(2)}$	10.0	250	10.2003	0.2003	2.00%	1.7499	1.5801	92.4%
		500	10.2058	0.2058	2.06%	1.0461	1.0438	95.5%
		1000	10.1488	0.1488	1.49%	0.7062	0.7124	96.4%
$β_{01}^{(2)}$	−1.5	250	−1.3700	0.1300	8.66%	1.0214	0.8647	90.5%
		500	−1.5009	−0.0009	0.06%	0.6008	0.5727	94.8%
		1000	−1.5203	−0.0203	1.35%	0.3890	0.3913	95.1%
$β_{10}^{(2)}$	1.0	250	1.2480	0.2480	24.80%	1.1376	0.8668	89.8%
		500	1.0351	0.0351	3.51%	0.6094	0.5584	95.7%
		1000	0.9963	−0.0037	0.37%	0.3861	0.3786	94.5%
$a^{(3)}$	2.0	250	2.1646	0.1646	8.23%	0.5485	0.4615	91.7%
		500	2.0359	0.0359	1.79%	0.2999	0.2943	94.2%
		1000	1.9983	−0.0017	0.09%	0.2041	0.1999	93.6%
$b^{(3)}$	1.0	250	1.0285	0.0285	2.85%	0.2424	0.2152	93.7%
		500	0.9973	−0.0027	0.27%	0.1450	0.1424	94.1%
		1000	0.9842	−0.0158	1.58%	0.0996	0.0978	93.2%
$β_{01}^{(3)}$	1.5	250	1.3806	−0.1194	7.96%	1.6089	1.4288	90.6%
		500	1.4975	−0.0025	0.17%	0.9960	0.9801	93.6%
		1000	1.5358	0.0358	2.39%	0.6744	0.6826	94.6%
$β_{10}^{(3)}$	−2.0	250	−2.1647	−0.1647	8.23%	1.7032	1.5859	92.0%
		500	−2.0433	−0.0433	2.16%	1.0826	1.0441	93.3%
		1000	−1.9855	0.0145	0.73%	0.7068	0.7152	95.2%
$π_{1}$	0.5	250	0.5012	0.0012	0.24%	0.0508	0.0571	97.0%
		500	0.5058	0.0058	1.15%	0.0328	0.0395	97.5%
		1000	0.5070	0.0070	1.39%	0.0224	0.0278	97.1%
$π_{2}$	0.3	250	0.2936	−0.0064	2.14%	0.0373	0.0452	97.2%
		500	0.2935	−0.0065	2.18%	0.0247	0.0317	97.4%
		1000	0.2950	−0.0050	1.68%	0.0170	0.0223	98.4%
$π_{3}$	0.2	250	0.2053	0.0053	2.63%	0.0444	0.0484	96.1%
		500	0.2008	0.0008	0.39%	0.0275	0.0330	98.0%
		1000	0.1981	−0.0019	0.96%	0.0196	0.0230	96.9%

NH-CTMC: non-homogeneous continuous-time Markov chain; PB: percentage bias; SD: standard deviation; SE: standard error; CP: coverage probability; MLE: maximum likelihood estimator.

3.5. Comparative evaluation of latent class prediction methods

To further validate the proposed method, we compared its ability to classify latent classes against other well-known longitudinal clustering models. Specifically, we compared with the following four algorithms:

1.
$k$ -means longitudinal clustering (KmL):⁴² An extended method of the $k$ -means algorithm for longitudinal data, optimized to handle irregular time points.
2.
Two-step clustering using a linear model and $k$ -means (LMKM):⁴³ A two-step hybrid method that first fits individual-level linear models to summarize temporal trajectories, followed by $k$ -means clustering on the estimated coefficients.
3.
Finite Gaussian mixture model (GMM):⁴⁴ A model-based clustering technique that assumes data arise from a mixture of Gaussian distributions, offering a probabilistic framework for clustering.
4.
Latent class mixed model (LCMM):^45,46 A flexible longitudinal model that accommodates heterogeneity within latent classes and allows inclusion of covariates through mixed-effects modeling.
5.
Latent continuous-time Markov chain (LCTMC):²⁴ A model-based clustering approach using CTMCs with latent class structure.
We utilized the R packages latrend⁴³ and lcmm⁴⁷ to fit these models. For each method, the number of clusters was set to $K = 3$ , and covariates were included for clustering only in LCMM.

Table 2 presents the latent class prediction performance metrics: precision, sensitivity, specificity, and $F_{1}$ -score for each latent class $k$ , calculated as follows:
$\begin{aligned} {Precision}^{(k)} & = \frac{# {j : z_{j} = k, {\hat{z}}_{j} = k}}{# {j : {\hat{z}}_{j} = k}}, \\ {Sensitivity}^{(k)} & = \frac{# {j : z_{j} = k, {\hat{z}}_{j} = k}}{# {j : z_{j} = k}}, \\ {Specificity}^{(k)} & = \frac{# {j : z_{j} \neq k, {\hat{z}}_{j} \neq k}}{# {j : z_{j} \neq k}}, \\ F_{1} {-score}^{(k)} & = \frac{2 \cdot {Precision}^{(k)} \cdot {Sensitivity}^{(k)}}{{Precision}^{(k)} + {Sensitivity}^{(k)}}, \end{aligned}$
with the following overall accuracy:
$Overall Accuracy = \frac{# {j : z_{j} = {\hat{z}}_{j}}}{n} .$
Compared to other longitudinal clustering models such as KmL, LMKM, GMM, LCMM, and LCTMC, the latent NH-CTMC model consistently demonstrated superior performance across all sample sizes. While competing methods yielded overall accuracies below 64%, the NH-CTMC model achieved notably higher and more stable accuracy rates of 78.43%, 79.73%, and 80.27% for $n = 250$ , $500$ , and $1,000$ , respectively. Class-specific metrics further reinforce the robustness of the NH-CTMC approach; across increasing sample sizes, the NH-CTMC model maintained high sensitivity (85.23% for Class 1, 79.46% for Class 2, and 69.09% for Class 3) and $F_{1}$ -scores (82.48%, 81.29%, and 72.57%), surpassing all the other models. The NH-CTMC model also exhibited strong specificity across classes and sample sizes, addressing its effectiveness in accurately distinguishing among latent trajectories and its stability as more data becomes available.

Table 2.
Latent class prediction analysis based on 1000 simulation runs for each sample size ( $n = 250$ , $500$ , and $1,000$ ), time up to $T = 20$ , and $K = 3$ latent classes, presenting means of precision, sensitivity, specificity, and $F_{1}$ -score for each class, as well as overall accuracy for the KmL, LMKM, GMM, LCMM, LCTMC, and latent NH-CTMC models.

$n$ KmL LMKM LPA LCMM LCTMC NH-CTMC

Class 1

Precision 250 0.6605 0.5448 0.6712 0.6594 0.6123 0.7904

500 0.6601 0.5292 0.6704 0.6565 0.6067 0.7973

1000 0.6598 0.5221 0.6404 0.6572 0.6015 0.7999

Sensitivity 250 0.5394 0.4645 0.6407 0.5714 0.8677 0.8311

500 0.5605 0.4750 0.6399 0.5804 0.8938 0.8463

1000 0.5738 0.4766 0.6209 0.5945 0.9035 0.8523

Specificity 250 0.7231 0.6066 0.6857 0.7064 0.4279 0.7763

500 0.7119 0.5750 0.6844 0.6985 0.4090 0.7824

1000 0.7041 0.5628 0.6334 0.6921 0.3977 0.7857

$F_{1}$ -score 250 0.5897 0.4974 0.6493 0.6079 0.7053 0.8077

500 0.6039 0.4987 0.6515 0.6129 0.7152 0.8199

1000 0.6128 0.4974 0.6189 0.6218 0.7186 0.8248

Class 2

Precision 250 0.6241 0.3908 0.6010 0.6474 0.8742 0.8228

500 0.6252 0.3771 0.5950 0.6479 0.8831 0.8309

1000 0.6272 0.3715 0.5603 0.6527 0.8873 0.8341

Sensitivity 250 0.6921 0.4055 0.7033 0.6780 0.5215 0.7726

500 0.6954 0.3630 0.6961 0.6817 0.5314 0.7860

1000 0.7007 0.3489 0.6124 0.6894 0.5382 0.7946

Specificity 250 0.8189 0.7263 0.7928 0.8335 0.9630 0.9253

500 0.8195 0.7419 0.7950 0.8342 0.9666 0.9299

1000 0.8199 0.7466 0.8028 0.8372 0.9690 0.9315

$F_{1}$ -score 250 0.6545 0.3923 0.6407 0.6549 0.6434 0.7921

500 0.6573 0.3665 0.6378 0.6587 0.6586 0.8057

1000 0.6612 0.3581 0.5858 0.6663 0.6676 0.8129

Class 3

Precision 250 0.1880 0.2584 0.4075 0.2337 0.5214 0.7461

500 0.1703 0.2545 0.2984 0.2076 0.5474 0.7630

1000 0.1636 0.2533 0.2100 0.1965 0.5356 0.7701

Sensitivity 250 0.2565 0.3296 0.2547 0.3011 0.1892 0.6850

500 0.2119 0.3299 0.1985 0.2613 0.1518 0.6916

1000 0.1880 0.3297 0.1938 0.2356 0.1266 0.6909

Specificity 250 0.7424 0.7602 0.8556 0.7658 0.9251 0.9356

500 0.7521 0.7569 0.8372 0.7672 0.9444 0.9439

1000 0.7608 0.7561 0.8178 0.7739 0.9518 0.9472

$F_{1}$ -score 250 0.2135 0.2862 0.2823 0.2564 0.2325 0.7030

500 0.1868 0.2858 0.2223 0.2257 0.2124 0.7203

1000 0.1743 0.2857 0.1949 0.2095 0.1936 0.7257

Overall accuracy 250 0.5286 0.4198 0.5823 0.5493 0.6281 0.7843

500 0.5312 0.4124 0.5685 0.5470 0.6367 0.7973

1000 0.5347 0.4089 0.5329 0.5512 0.6385 0.8027

KmL: $k$ -means longitudinal clustering; LMKM: linear model and $k$ -means; GMM: Gaussian mixture model; LCMM: LCTMC: latent class mixed model; NH-CTMC: non-homogeneous continuous-time Markov chain.

4. Application to hypertension study

	$n$	KmL	LMKM	LPA	LCMM	LCTMC	NH-CTMC
Class 1
Precision	250	0.6605	0.5448	0.6712	0.6594	0.6123	0.7904
	500	0.6601	0.5292	0.6704	0.6565	0.6067	0.7973
	1000	0.6598	0.5221	0.6404	0.6572	0.6015	0.7999
Sensitivity	250	0.5394	0.4645	0.6407	0.5714	0.8677	0.8311
	500	0.5605	0.4750	0.6399	0.5804	0.8938	0.8463
	1000	0.5738	0.4766	0.6209	0.5945	0.9035	0.8523
Specificity	250	0.7231	0.6066	0.6857	0.7064	0.4279	0.7763
	500	0.7119	0.5750	0.6844	0.6985	0.4090	0.7824
	1000	0.7041	0.5628	0.6334	0.6921	0.3977	0.7857
$F_{1}$ -score	250	0.5897	0.4974	0.6493	0.6079	0.7053	0.8077
	500	0.6039	0.4987	0.6515	0.6129	0.7152	0.8199
	1000	0.6128	0.4974	0.6189	0.6218	0.7186	0.8248
Class 2
Precision	250	0.6241	0.3908	0.6010	0.6474	0.8742	0.8228
	500	0.6252	0.3771	0.5950	0.6479	0.8831	0.8309
	1000	0.6272	0.3715	0.5603	0.6527	0.8873	0.8341
Sensitivity	250	0.6921	0.4055	0.7033	0.6780	0.5215	0.7726
	500	0.6954	0.3630	0.6961	0.6817	0.5314	0.7860
	1000	0.7007	0.3489	0.6124	0.6894	0.5382	0.7946
Specificity	250	0.8189	0.7263	0.7928	0.8335	0.9630	0.9253
	500	0.8195	0.7419	0.7950	0.8342	0.9666	0.9299
	1000	0.8199	0.7466	0.8028	0.8372	0.9690	0.9315
$F_{1}$ -score	250	0.6545	0.3923	0.6407	0.6549	0.6434	0.7921
	500	0.6573	0.3665	0.6378	0.6587	0.6586	0.8057
	1000	0.6612	0.3581	0.5858	0.6663	0.6676	0.8129
Class 3
Precision	250	0.1880	0.2584	0.4075	0.2337	0.5214	0.7461
	500	0.1703	0.2545	0.2984	0.2076	0.5474	0.7630
	1000	0.1636	0.2533	0.2100	0.1965	0.5356	0.7701
Sensitivity	250	0.2565	0.3296	0.2547	0.3011	0.1892	0.6850
	500	0.2119	0.3299	0.1985	0.2613	0.1518	0.6916
	1000	0.1880	0.3297	0.1938	0.2356	0.1266	0.6909
Specificity	250	0.7424	0.7602	0.8556	0.7658	0.9251	0.9356
	500	0.7521	0.7569	0.8372	0.7672	0.9444	0.9439
	1000	0.7608	0.7561	0.8178	0.7739	0.9518	0.9472
$F_{1}$ -score	250	0.2135	0.2862	0.2823	0.2564	0.2325	0.7030
	500	0.1868	0.2858	0.2223	0.2257	0.2124	0.7203
	1000	0.1743	0.2857	0.1949	0.2095	0.1936	0.7257
Overall accuracy	250	0.5286	0.4198	0.5823	0.5493	0.6281	0.7843
	500	0.5312	0.4124	0.5685	0.5470	0.6367	0.7973
	1000	0.5347	0.4089	0.5329	0.5512	0.6385	0.8027

We applied our latent NH-CTMC model to an ambulatory blood pressure dataset for a comprehensive analysis of hypertension. Hypertension is a major risk factor for various health issues, including heart disease and stroke. Approximately 78 million people in the United States are affected by high blood pressure, significantly increasing the risk of cardiovascular diseases.^48–50 Around 48.1% of adults are expected to develop hypertension,⁵¹ and its national and global prevalences continue to rise.^52,53 Research has highlighted links between hypertension and factors such as diet, age, and body mass index (BMI).^25,54,55 To better understand the dynamics of hypertension, we applied our model to the DASH study, a multicenter randomized controlled trial aimed at determining the effects of dietary patterns on reducing blood pressure.²⁵ The DASH dataset contains 24-hour ambulatory blood pressure measurements for 341 patients, recorded approximately every 30 minutes. Each record includes values for systolic blood pressure (SBP), observation time, and covariate information such as dietary plan, age, and BMI, with data extracted starting from the time $t = 0$ when each subject woke up.

Hypertension states, measurement times, and additional covariates were derived from the dataset. Hypertension states were determined based on SBP measures, classified into two states: (0) normal and (1) hypertensive, corresponding to SBP levels less than 130 mmHg and greater than or equal to 130 mmHg, respectively.⁵⁶ Measurement time was calculated as the elapsed time from the patient waking up, recorded in hours. Three covariates were considered: dietary plan, age, and BMI. Patients were initially categorized into three dietary interventions: the control diet, an intermediate diet, and a reduced diet with lower fat content and higher intake of vegetables, fruits, proteins, and dairy products. The patients with an intermediate diet had a dietary plan similar to the control diet group overall, except for higher intakes of fruits and vegetables. The dietary intervention was treated as a binary covariate, with the first group (0) consisting of control and intermediate diet groups, and the second group (1) assigned to the reduced diet group. Age was a continuous covariate directly extracted from the dataset, while BMI was calculated from weight and height.

Prior to conducting a detailed analysis, we determined the optimal number of latent classes using the GIC with a penalty weight parameter of $η = 10$ . As illustrated in Figure 3, the model with $K = 3$ latent classes achieved the lowest GIC among the seven models considered. The figure also depicts how the negative log-likelihood decreases with increasing $K$ , though the improvement becomes marginal beyond $K = 3$ . At the same time, the number of parameters increases linearly with $K$ . The $K = 3$ model thus provided the best trade-off between model fit and complexity, requiring 26 parameters. Based on these observations, the $K = 3$ model was selected for further analysis.

Figure 3.

GIC ( $η = 10$ ), negative log-likelihoods, and the numbers of parameters of latent NH-CTMC models from 1 to 7 latent classes, computed from DASH data. GIC: generalized information criterion; NH-CTMC: non-homogeneous continuous-time Markov chain; DASH: dietary approaches to stop hypertension.

The rate parameters $a$ and covariate coefficients $β$ for the three latent classes are presented in Table 3, along with their corresponding SEs in parentheses. Table 4 categorizes subjects into three latent classes for demographic analysis. Model performance was assessed using 10-fold cross-validation, as illustrated in Figure 4. Each covariate was normalized to values between 0 and 1 using min-max scaling,⁵⁷ and the same scaling criteria values were applied to the covariates in the test datasets during cross-validation process. After model fitting, latent classes for individual subjects in the test sets were inferred from their latent class likelihoods, and real-time state predictions were generated using the corresponding class-specific parameters. The proposed model achieved a cross-validated state prediction accuracy of 86.73% with an area under the receiver operating characteristic curve (AUROC) of 95.38%. This performance substantially outperformed both the original single NH-CTMC model of Chang et al.¹¹, which yielded an accuracy of 75.69% and an AUROC of 77.37%, and the longitudinal logistic regression model, which achieved an accuracy of 60.28% and an AUROC of 61.49%. The AUROC values were computed by evaluating the predicted probabilities of longitudinal binary states based on the estimated latent class-specific transition rate parameters. The ROC curves of the three models are presented in Figure 4.

Table 3.

Estimated coefficients values of the latent NH-CTMC model with 3 latent classes on the DASH dataset, with standard error (SE) values enclosed in parentheses.

	Class 1	Class 2	Class 3
$a$	0.4984 (0.2195)	3.2086 (0.4428)	3.2466 (0.2969)
$b$	0.4013 (0.3493)	3.0192 (0.4528)	2.8248 (0.4341)
$β_{01}^{diet}$	1.1183 (1.4120)	−0.8976 (0.5617)	1.2654 (1.0319)
$β_{10}^{diet}$	4.9830 (2.8395)	1.8345 (1.3147)	9.7988 (2.5447)
$β_{01}^{age}$	−4.4548 (1.8768)	1.5746 (1.0463)	1.8603 (1.5869)
$β_{10}^{age}$	−0.9620 (3.6933)	1.5943 (1.3173)	1.3076 (0.6661)
$β_{01}^{BMI}$	−7.1617 (4.5916)	3.9403 (2.5611)	1.4363 (1.2245)
$β_{10}^{BMI}$	−0.7417 (2.3276)	5.4173 (2.7474)	1.4299 (0.9531)
$π$	0.2146 (0.0256)	0.3511 (0.0399)	0.4342 (0.0405)

DASH: dietary approaches to stop hypertension; NH-CTMC: non-homogeneous continuous-time Markov chain; BMI: body mass index.

Table 4.

Demographic information of three latent classes of DASH dataset, stratified by predicted latent classes.

	Class 1	Class 2	Class 3
Number of subjects	77 (22.58%)	116 (34.02%)	148 (43.40%)
Age: Mean (SD)	46.47 (11.00)	43.25 (9.20)	46.40 (10.52)
Male: Freq. (%)	32 (41.56%)	56 (48.28%)	91 (61.49%)
Race: Freq. (%)
White	31 (40.26%)	41 (35.34%)	60 (40.54%)
Black	41 (53.25%)	69 (59.48%)	80 (54.05%)
Other	5 (6.49%)	6 (5.17%)	8 (5.41%)
Diet: Freq. (%)
Control	33 (42.86%)	28 (24.14%)	49 (33.11%)
Intermediate	16 (20.78%)	45 (38.79%)	55 (37.16%)
Reduced	28 (36.36%)	43 (37.07%)	44 (29.73%)
BMI: Mean (SD)	28.02 (3.87)	27.82 (3.81)	28.36 (3.73)
Smoke: Freq. (%)	30 (38.96%)	58 (50.00%)	51 (34.46%)
Alcohol: Mean (SD)	1.65 (3.01)	1.39 (2.68)	1.08 (2.13)

DASH: dietary approaches to stop hypertension; BMI: body mass index; SD: standard deviation.

Figure 4.

10-fold cross-validated ROC curves for the 3-latent class NH-CTMC model, single NH-CTMC model, and longitudinal logistic regression model applied to DASH data for predicting hypertensive states, presenting accuracy and AUC metrics. DASH: dietary approaches to stop hypertension; NH-CTMC: non-homogeneous continuous-time Markov chain; ROC: receiver operating characteristic curve; AUC: area under the curve.

The estimated coefficients and demographic distributions provide valuable insights into the differences between the three latent classes identified in the DASH dataset. As shown in Table 3, Class 1 is characterized by relatively low transition rate parameters ( $a^{(1)} = 0.4984$ , $b^{(1)} = 0.4013$ ), indicating the slowest rate of changes between hypertension and normal states compared to the higher rates observed in Classes 2 and 3. Class 1 represents the oldest group (mean age of 46.47 years) and includes a lower proportion of smokers (38.96%) compared to Class 2. These characteristics are consistent with the slower transition rates observed in this group, as both older age and lower smoking prevalence may contribute to a slower progression towards hypertension. Class 2 exhibited the highest transition rates ( $a^{(2)} = 3.2086$ , $b^{(2)} = 3.0192$ ), likely driven by its clinical and demographic composition. The class had the highest smoking proportion (50.00%) and was characterized by strong positive coefficients for BMI ( $β_{01}^{BMI; (2)} = 3.9403$ , $β_{10}^{BMI; (2)} = 5.4173$ ). The results suggest that elevated BMI and smoking status in the group accelerate transitions between hypertensive and normal states. Class 3 demonstrated dynamic transitions with high rate parameters ( $a^{(3)} = 3.2466$ , $b^{(3)} = 2.8248$ ), though the coefficients for covariates such as BMI and diet were generally moderate compared to Class 2. The group was notable for its high proportion of male participants (61.49%) and a slightly higher average BMI (28.36) compared to other groups. Across all classes, the positive values of $β_{10}^{diet}$ (ranging from $1.8345$ to $9.7988$ ) indicate that adherence to a reduced diet accelerates transitions from hypertensive to normal states. This effect is particularly emphasized in Class 3 ( $β_{10}^{diet; (3)} = 9.7988$ ), suggesting the potential effectiveness of dietary interventions for this particular group.

Demographically, Table 4 shows that Class 3 had the largest proportion of subjects (43.40%), followed by Class 2 (34.02%) and Class 1 (22.58%). The classes exhibited subtle variations in age, with Class 1 and Class 3 participants being slightly older on average (46.47 and 46.40 years, respectively) compared to Class 2 (43.25 years). Gender composition varied across classes, with Class 3 having a higher proportion of male subjects (61.49%) compared to Class 1 (41.56%). Racial distributions were relatively similar, with Black participants representing the majority in all classes (53.25%, 59.48%, and 54.05% for Classes 1, 2, and 3, respectively).

The data analysis highlighted distinct transition dynamics and behavioral influences across the three latent classes identified by the NH-CTMC model. Class 1, characterized by older individuals with more stable health states, showed slower transition rates. In contrast, Class 2 exhibited the most dynamic transitions, potentially driven by higher smoking prevalence and BMI. Class 3 demonstrated a balanced but responsive transition pattern, with significant dietary effects. These findings underscore the value of latent class modeling in revealing the nuanced relationship between demographic characteristics, behavioral factors, and transition dynamics within the NH-CTMC model framework, offering a coherent explanation for the observed differences in transition rates and covariate effects.

5. Discussion

CTMCs with finite state spaces find broad applications in modeling dynamic processes across diverse medical domains.^3–5^,10 This study derived closely approximated closed-form transition probability functions for a two-state NH-CTMC and extended the framework to cluster subjects into latent groups with varying transition rates. For parameter estimation, we constructed a log-likelihood function, estimated the initial values for the optimization procedure using Algorithm 1, and utilized the L-BFGS-B method implemented in optimParallel for efficient numerical maximization. SEs were obtained from the inverse of the observed Hessian matrix to quantify uncertainty in the estimates. Computational findings underscored the robustness and efficiency of the proposed approach, and the application to the DASH study demonstrated its potential for real-time medical monitoring by capturing heterogeneous transition dynamics across latent classes.

The latent NH-CTMC model offers several advantages compared to other conventional longitudinal models. Unlike traditional CTMC models, the latent NH-CTMC captures varying rates of state transition events and monitors their trends. In contrast to longitudinal logistic regression models, the NH-CTMC maintains the transitioning nature of the process through its Markovian property. Moreover, compared to NH-CTMC models without latent classes, the latent model categorizes subjects into groups with different transition rates, enabling the capture of varying behaviors within each group and facilitating more accurate predictions based on the estimated latent class. The ability to cluster subjects provides inferences on time-varying transitions and supports demographic analysis, revealing behavioral and demographic factors influencing state transitions.

Our findings highlighted the computational efficiency of the proposed latent NH-CTMC model. While Kuo et al.²⁴ employed a latent CTMC framework and required simulations with 10,000 samples to obtain stable results for CPs and latent class prediction, our method incorporated the non-homogeneous nature of the Markov chain and achieved comparable stability even with a sample size as low as $n = 250$ . The significant reduction in computational burden underscores the model’s flexibility and adaptability for real-world applications.

The application of the latent NH-CTMC model to the DASH study dataset underscores its ability to perform latent class analysis, identifying distinct patterns of health state transitions and exploring how various demographic and behavioral factors influence these transitions. Through latent class modeling, we identified three distinct classes of subjects, each exhibiting unique transition dynamics based on factors such as age, BMI, and dietary intervention status. The model successfully categorized individuals into latent classes, offering insights into the relationship between covariates and the rates of transitions between normal and hypertensive states. This demonstrates the latent NH-CTMC model’s potential in both state prediction and latent class analysis, providing a powerful framework for uncovering hidden structures in longitudinal health data like the DASH study.

Despite its advantages, the latent NH-CTMC model presents certain limitations. First, the transition rate is defined by a fixed form of log-logistic function, while general longitudinal studies may involve various forms of transition rates. Second, the maximum likelihood estimation process can become computationally intensive and may yield inaccurate results when the number of latent classes increases, due to the rapid growth in the number of parameters that can lead to suboptimal MLE outcomes. Future research directions for this work include the following: (a) Exploring more general forms of transition rates and deriving the corresponding general transition probabilities, (b) reducing computational burden by investigating optimal parameter estimation methods, and (c) simplifying transition probabilities by reducing the number of parameters as the number of latent classes increases. By addressing these limitations, the latent NH-CTMC framework can further enhance its utility and applicability for modeling complex longitudinal processes in healthcare research.

Footnotes

Acknowledgments

The authors would like to acknowledge the support and resources provided by the University of Texas Health Science Center at Houston and Louisiana State University Health Sciences Center.

ORCID iDs

Joonha Chang

Wenyaw Chan

Ethical approval

We obtained ethical approval for this study from the Institutional Review Board (IRB) at the University of Texas Health Science Center at Houston.

Informed consent

All participants provided informed consent prior to inclusion in the study.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data and code availability statement

The dataset used in this study was obtained from the Biologic Specimen and Data Repository Information Coordinating Center (BioLINCC). The Dietary Approaches to Stop Hypertension (DASH) study aimed to evaluate the impact of dietary patterns on lowering hypertension. To access data from the DASH study, visit https://biolincc.nhlbi.nih.gov/studies/dash/. A sample R script for simulating and estimating the latent NH-CTMC model is publicly available at the following GitHub repository: .

Appendices

References

Tijms

. A first course in stochastic models. Chichester & New York: John Wiley and Sons, 2003.

Duffy

Chen

Tabar

, et al. Estimation of mean sojourn time in breast cancer screening using a Markov chain model of both entry to and exit from the preclinical detectable phase. Stat Med 1995; 14: 1531–1543.

Uhry

Hédelin

Colonna

, et al. Multi-state Markov models in cancer screening evaluation: a brief review and case study. Stat Methods Med Res 2010; 19: 463–486.

Chan

Tsai

, et al. Analysis of transtheoretical model of health behavioral changes in a nutrition intervention study—a continuous time Markov chain model with bayesian approach. Stat Med 2015; 34: 3577–3589.

Chan

Tilley

. Continuous time Markov chain approaches for analyzing transtheoretical models of health behavioral change: A case study and comparison of model estimations. Stat Methods Med Res 2018; 27: 593–607.

Chen

THH

Yen

Lai

, et al. Evaluation of a selective screening for colorectal carcinoma: the taiwan multicenter cancer screening (tamcas) project. Cancer: Interdiscip Inter J Am Cancer Soc 1999; 86: 1116–1128.

Commenges

. Multi-state models in epidemiology. Lifetime Data Anal 1999; 5: 315–327.

Aalen

Borgan

Gjessing

. Survival and event history analysis: a process point of view. New York, NY and London: Springer, 2008.

Johnson

Luecke

. Nonhomogeneous, continuous-time Markov chains defined by series of proportional intensity matrices. Stoch Process Their Appl 1989; 32: 171–181.

10.

Chen

Zhou

. Non-homogeneous Markov process models with informative observations with an application to alzheimer’s disease. Biometr J 2011; 53: 444–463.

11.

Chang

Chan

Lin

, et al. Non-homogeneous continuous-time Markov chain with covariates: Applications to ambulatory hypertension monitoring. Stat Med 2023; 42: 1965–1980.

12.

Ngan

Chan

Leon-Novelo

, et al. Estimation of non-monotonic transition rates in a semi-Markov process with covariates adjustments and application to caregivers’ stress data. Stat Med 2023; 42: 5646–5656.

13.

Massey

Whitt

. Uniform acceleration expansions for Markov chains with time-varying rates. Ann Appl Prob 1998; 8(4): 1130–1155.

14.

Zeifman

Satin

Kovalev

, et al. Facilitating numerical solutions of inhomogeneous continuous time Markov chains using ergodicity bounds obtained with logarithmic norm method. Mathematics 2020; 9: 42.

15.

Reznícek

Kohlík

Kubátová

. Accurate inexact calculations of non-homogeneous Markov chains. In: 2019 22nd Euromicro conference on digital system design (DSD), 2019, pp.470–477. IEEE.

16.

Chan

. Analysis of longitudinal multinomial outcome data. Biomet J 2006; 48: 319–326.

17.

Řezníček

Kohlík

Kubátová

. Non-homogeneous continuous time Markov chains calculations. In: 2020 23rd Euromicro conference on digital system design (DSD), 2020, pp.664–671. IEEE.

18.

Vasilenko

Kugler

Butera

, et al. Patterns of adolescent sexual behavior predicting young adult sexually transmitted infections: A latent class analysis approach. Arch Sex Behav 2015; 44: 705–715.

19.

Hagenaars

McCutcheon

. Applied latent class analysis. Cambridge & New York: Cambridge University Press, 2002.

20.

Wang

Zhao

Therneau

, et al. Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records. J Biomed Inform 2020; 102: 103364.

21.

Krivitsky

Handcock

Raftery

, et al. Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Soc Networks 2009; 31: 204–213.

22.

Mauricio

Lopez

. A latent classification of male batterers. Violence Vict 2009; 24: 419–438.

23.

Liang

Chan

Swartz

, et al. Incorporating latent survival trajectories and covariate heterogeneity in time-to-event data analysis: a joint mixture model approach. BMC Med Res Methodol 2025; 25: 132.

24.

Kuo

Chan

Leon-Novelo

, et al. Latent classification model for censored longitudinal binary outcome. Stat Med 2024; 43: 3943–3957.

25.

Appel

Moore

Obarzanek

, et al. A clinical trial of the effects of dietary patterns on blood pressure. New Engl J Med 1997; 336: 1117–1124.

26.

Atkinson

. An introduction to numerical analysis. New York: John Wiley & Sons, 1991.

27.

Bolker

. Maximum likelihood estimation and analysis with the bbmle package. R package version 1.0, 2014.

28.

Nelder

Mead

. A simplex method for function minimization. Comput J 1965; 7: 308–313.

29.

Gerber

Furrer

. optimparallel: An R package providing a parallel version of the L-BFGS-B optimization method. R J 2019; 11: 352–358.

30.

Liu

Nocedal

. On the limited memory bfgs method for large scale optimization. Math Program 1989; 45: 503–528.

31.

Zhu

Byrd

, et al. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans Math Software (TOMS) 1997; 23: 550–560.

32.

Fletcher

. Practical methods of optimization. Chichester & New York: John Wiley & Sons, 2000.

33.

Fisher

. On the mathematical foundations of theoretical statistics. Philosop Trans R Soc London Ser A, Contain Papers Math Phys Char 1922; 222: 309–368.

34.

Ferguson

. A course in large sample theory. New York: Routledge (Taylor & Francis), 2017.

35.

MacQueen

, et al. Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, 1967, pp.281–297. Oakland, CA, USA.

36.

Lewis

Shedler

. Simulation of nonhomogeneous poisson processes by thinning. Naval Res Logist Quart 1979; 26: 403–413.

37.

Saltzman

Drew

Leemis

, et al. Simulating multivariate nonhomogeneous poisson processes using projections. ACM Trans Model Comput Simul (TOMACS) 2012; 22: 1–13.

38.

Bhansali

Downham

. Some properties of the order of an autoregressive model selected by a generalization of akaike’s epf criterion. Biometrika 1977; 64: 547–551.

39.

Stoica

Selen

. Model-order selection: a review of information criterion rules. IEEE Signal Process Mag 2004; 21: 36–47.

40.

Akaike

. A new look at the statistical model identification. IEEE Trans Automat Contr 1974; 19: 716–723.

41.

Sakamoto

Ishiguro

Kitagawa

. Akaike information criterion statistics. Dordrecht, Netherlands: D Reidel 1986; 81: 26853.

42.

Genolini

Falissard

. Kml: k-means for longitudinal data. Comput Stat 2010; 25: 317–328.

43.

Teuling

Pauws

Heuvel

Evd

. latrend: A framework for clustering longitudinal data. arXiv preprint arXiv:240214621 2024.

44.

Scrucca

Fop

Murphy

, et al. mclust 5: clustering, classification and density estimation using gaussian finite mixture models. R J 2016; 8: 289.

45.

Verbeke

Lesaffre

. A linear mixed-effects model with heterogeneity in the random-effects population. J Am Stat Assoc 1996; 91: 217–221.

46.

Muthén

Shedden

. Finite mixture modeling with mixture outcomes using the em algorithm. Biometrics 1999; 55: 463–469.

47.

Proust-Lima

Philipps

Liquet

. Estimation of extended mixed models using latent classes and latent processes: The R package LCMM. J Stat Softw 2017; 78: 1–56.

48.

Sidney

Rosamond

Howard

, et al. The heart disease and stroke statistics—2013 update and the need for a national cardiovascular surveillance system, 2013.

49.

George

MacDonald

. Home blood pressure monitoring. European Cardiol Rev 2015; 10: 95.

50.

Jain

Minhas

AMK

Morris

, et al. Demographic and regional trends of heart failure-related mortality in young adults in the us, 1999-2019. JAMA Cardiol 2022; 7: 900–904.

51.

Stierman

Afful

Carroll

, et al. National health and nutrition examination survey 2017–march 2020 prepandemic data files—Development of files and prevalence estimates for selected health outcomes. National Health Statistics Reports. No. 158, 2021.

52.

Ritchey

Loustalot

Bowman

, et al. Trends in mortality rates by subtypes of heart disease in the united states, 2000-2010. J Am Med Assoc 2014; 312: 2037–2039.

53.

Lan

. Global, regional, and national burden of hypertensive heart disease during 1990–2019: an analysis of the global burden of disease study 2019. BMC Public Health 2022; 22: 1–10.

54.

Buford

. Hypertension and aging. Ageing Res Rev 2016; 26: 96–111.

55.

Sabaka

Dukat

Gajdosik

, et al. The effects of body weight loss and gain on arterial hypertension control: an observational prospective study. Eur J Med Res 2017; 22: 1–7.

56.

Whelton

Carey

Aronow

, et al. 2017 acc/aha/aapa/abc/acpm/ags/apha/ash/aspc/nma/pcna guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: a report of the american college of cardiology/american heart association task force on clinical practice guidelines. J Am Coll Cardiol 2018; 71: e127–e248.

57.

Han

Kamber

Tong

. Data Mining: Concepts and Techniques. 3rd ed. Waltham, MA: Morgan Kaufmann Publishers, 2012.