Sage Journals: Discover world-class research

Abstract

The identification of latent profile trajectories in longitudinal studies represents an important challenge for specialists since they could provide insights to better understand their problem of interest. The majority of the statistical methodologies for cluster analysis for longitudinal data are based on growth curve or mixed-effects models, and often incorporate covariates for a better adjustment. In particular, for Bayesian nonparametric methods, Dirichlet process mixture models are widely used together. We propose a clustering methodology for longitudinal data based on mixture models generated by a discrete random probability measure whose weights are decreasingly ordered by construction. Additionally, data is modeled without making use of covariates and assuming independence across time for individual measurements. Our approach also provides a straightforward procedure to merge some estimated groups, since it could happen that there are many of them, to be easily explained by experts. Our results suggest that, at least for a first analysis, this framework is enough to effectively detect groups in the data; further exploration of each group could incorporate extra information. We apply our methodology for detecting adiposity trajectories in Mexican children in a secondary analysis of the “Prenatal Omega-3 fatty acid Supplementation and Child Growth and Development” study (POSGRAD) cohort.

Keywords

Bayesian nonparametrics decreasing-weight mixture models latent profile analysis repeated measurements

1. Introduction

There are a number of areas with working scenarios, for example, epidemiological, medical, biological or psychological studies, leading to individual growth trajectories or longitudinal profiles.^1–4 Researchers are often interested in understanding their dynamics, that is, the development or tendency over time of the resulting profiles. Additionally, there could be subsets of individuals whose growth profiles are significantly different from the overall estimates. Therefore, it might be of interest to find patterns for these heterogeneous longitudinal data sets.

One can handle this situation from a statistical learning perspective as an unsupervised classification problem, where the aim is to identify classes of trajectories. The estimated classes allow to disaggregate a larger heterogeneous population into homogeneous subpopulations, which would pinpoint meaningful groups of more similar individual profiles. These groups could, in turn, be used as predictions to either follow the change in time of the relevant variable or to propose within the latent classes other predictive or causal models.

Moreover, this approach could potentially contribute to improve the analysis of information originated from longitudinal study designs, such as cohort studies, clinical trials or interventional studies, where the latter would have at least two measurements in time for individuals. In these schemes, the aim is often to study possible changes in patterns in order to understand how particular events develop and evolve through time in different areas,⁵ for example, in nutrition, health and social behavior, amongst others, enabling the identification of tendencies, changes and profile behavior in general, and improving the understanding of such events, for instance in the life course of health-disease process. Despite the striking value of such information, still only a few longitudinal studies consider in their analyses the identification of profile patterns or trajectories.^6,7

The statistical modeling of longitudinal data has often been performed through growth curve models^3,4,8–11 or mixed-effects models^2,12 and a strong emphasis is put on the estimation of the mean function. To capture the phenomenon-driven dynamics, a very flexible mean function should be considered, therefore, there have been many efforts to generate ad-hoc proposals given a particular data behavior.^13–15 Additionally, it is also common to incorporate covariates for the estimation, either for the mean or covariance functions.¹⁶ Similarly, other models treat observations as time dependent, leading to, for example, autoregressive mean functionals.^17–19 On a different perspective, longitudinal data has been also handled by computational methods.^20–22

In these methodologies, the identification of classes of trajectories helps to reveal common patterns in subgroups of the data, which is achieved through the detection of the underlying clusters conforming the population. Cluster analysis for mixed-effects models is performed via latent class models, where random-effects variables or the mean functional parameters play the role of classification variables.^14,23 Classical^6,18,24,25 and Bayesian approaches have been developed for this purpose. For the latter, the usage of random probability measures (RPMs) is a common tool,^15–17^,19,23,26,27 in particular the Dirichlet process.²⁸ One of the advantages of this approach is that the number of latent classes is inferred from the data; furthermore, these models can be expressed as mixture models, a very well known method for cluster analysis, which also allows to characterize groups probabilistically. However, in some proposals, there are parametric assumptions, hence they need to deal with estimation challenges such as label switching, local maxima trapping, and so on.

In this work, we consider a continuous outcome where the main interest is to identify patterns and classify the individual profiles accordingly. Unlike common methodologies for modeling longitudinal responses, we do not impose elaborated dependency structures to describe individual data, neither consider covariates. Our results suggest that we can prescind from them, at least, in a first stage when the main interest is clustering. Adopting a Bayesian nonparametric approach, patterns in the observed profiles are inferred based on a mixture model whose underlying RPM has the characteristic of having weights decreasingly ordered by construction. The first example of decreasing weights RPMs is the geometric process,²⁹ which has been already applied to different estimations procedures (see, e.g. Fuentes-García et al.,^30,31 and Gutiérrez et al.³²), and has also been extended to allow a more flexible weight structure.³³ An advantage of this type of processes, when compared with the Dirichlet process, is that the order in the weights serves as a constraint diminishing non identifiability issues. Also, these processes seem to have a faster convergence under the density estimation context, c.f. Fuentes-García et al.³⁰ and De Blasi et al.,³³ in the sense that less iterations are required to recover the mixing components.

To the best of our knowledge, there is no literature on clustering based on decreasing weights RPMs. This could be due to the fact that the geometric process tends to use a large number of components; however, the generalization provided by De Blasi et al.³³ overcomes this issue, and we wish to test it for the specific task of clustering. Additionally, since it might happen that the number of estimated classes is large to be interpreted by the expert users, we also present a straightforward procedure to fuse some groups by taking into account their profile estimated densities.

One motivation behind the proposed methodology is the better understanding of childhood obesity in developing countries, like Mexico, which would support strategies contributing its prevention. With this in mind, we consider a secondary data analysis for $286$ adolescents from the prospective birth cohort “Prenatal Omega-3 fatty acid Supplementation and Child Growth and Development” (POSGRAD) study.³⁴ The data was analyzed separately by biological sex; children have rapid growth at the measured ages, hence there is a mild dependence on previous measurements regarding the variable of interest. We identified three main profile classes.

2. Model framework

The detection of patterns in a given data set is the main goal of cluster analysis. These patterns aim to partition the data such that items gathered together, into a cluster or group, share some characteristics, that are similar among them, and at the same time, items in different clusters are different. Due to its generality, cluster analysis has received great attention in diverse fields, and as a consequence, we can find a vast amount of literature. Perhaps, mixture modeling is the most common methodology to perform cluster analysis. Mathematically, a mixture density $f$ is a convex combination of probability distributions, $g (\cdot ∣ θ_{j})$ , which can be written as

f (y) = \sum_{j = 1}^{L} w_{j} g (y ∣ θ_{j}),

where

L

stands for the number of components. According to the specific method,

L

is fixed or random, and also it can be finite or infinite. The clustering structure induced from this class of models is clearer when a membership variable

d

is introduced. This variable

d

will associate each observation

y

to one of the

L

mixture components. Defining

Pr (d = j) = w_{j}

, for

j = 1, \dots, L

, the augmented model is given by

f (y ∣ d) = g (y ∣ θ_{d}) .

In a broad sense, the

L

mixture components can be all different, that is there is a collection of densities

(g_{1}, \dots, g_{L})

; in practice, a single density is chosen, which is completely specified by its parameter

θ_{j}

, so we have

g_{j} (\cdot) := g (\cdot ∣ θ_{j})

. Therefore, for a given data set

y_{1}, \dots, y_{n}

, there are

n

membership variables

d_{1}, \dots, d_{n}

, from where the induced clusters

C_{j}

can be defined as

C_{j} = {y_{i} : d_{i} = j, i = 1, \dots, n}, j = 1, \dots, L .

Furthermore, all the observations belonging to some particular cluster

C_{j}

are modeled by the same probability density

g (\cdot ∣ θ_{j})

. This latter feature gives the adjective of model-based to the clustering procedure.

The choice of the value for the number of components, $L$ , entails different issues relevant for the clustering. For example, it is immediate to see that fixing a small value might underestimate the number of nonempty groups. Hence, it is common to fit mixture models having different values for such a variable and select the best option. On the other hand, following a Bayesian approach, an alternative is to treat $L$ as random, assign a prior distribution and infer about its value. Another direction is obtained by using RPMs like the Dirichlet process.²⁸ placing the resulting methodologies in the realm of Bayesian nonparametrics. In practice, here we are allowing the mixture to have, potentially, an infinite number of components a priori, and due to the underlying learning process of Bayesian models, only some of them are effectively used a posteriori. This also lets us to infer about the number of nonempty groups.

Besides the Dirichlet process, there exist other RPMs widely used in Bayesian nonparametrics, for example, the two parameter Poisson–Dirichlet process,³⁵ generalized Gamma process,³⁶ which are particular cases of Gibbs-type priors (see, e.g. De Blasi et al.³⁷), and others exhibiting a more complex structure, like the one presented by Gil-Leyva and Mena.³⁸ The election of the RPM determines, among other things, the structure of the mixing weights $w_{j}$ , $j = 1, 2, \dots$ , and consequently, of the clustering.

Our selection uses an infinite-component mixture model whose mixing weights are ordered almost surely by construction, meaning that $w_{j} > w_{j + 1}$ for $j = 1, 2, \dots$ . One example of this class of models is obtained from the so-called geometric process,²⁹ where the weights exhibit a geometric decay, that is $w_{j} := λ (1 - λ)^{j - 1}$ for some $(0, 1)$ -random variable $λ$ . We take a generalization of this process, based on a negative binomial distribution,³³ which has a more flexible decreasing form than the geometric process. By imposing the ordering constraint in the mixing weights, we expect to diminish the effect of the unidentifiability issue inherent to mixture models.

The mixture model we will use is, therefore, the following

f (y) = \sum_{j = 1}^{\infty} w_{j} g (y ∣ θ_{j}),

(1)

where the mixing weights are such that

w_{j} > w_{j + 1}

j = 1, 2, \dots

, and add up to one almost surely. For the sake of completeness, the derivation of these weights is presented in the Supplementary Material. The key idea is starting from an

r

-component mixture model with uniform weights, that is

v_{j} = 1 / r

for

j = 1, \dots, r

, and treating

r

as random distributed according to some

π

taking values over

N

. After marginalizing over

r

, the

j

th weight has the form

w_{j} = \sum_{r = j}^{\infty} \frac{π (r)}{r}, j = 1, 2, \dots .

(2)

A particular distribution of interest for

π

is the negative binomial distribution, whose probability function is given by

π (r ∣ s, λ) = (\binom{r + s - 2}{r - 1}) λ^{s} (1 - λ)^{s - 1}, r = 1, 2, \dots .

(3)

In addition, it is necessary to assign a prior distribution to the parameter

λ

in order to have a dense model over the sample space. For this class of weights, parameter

s

controls the decreasing structure in the sequence; the geometric process is recovered by setting

s = 2

. We refer the reader to De Blasi et al.³³ for further details.

2.1. Modeling longitudinal responses

Observations are modeled through the densities $g (\cdot ∣ θ_{j})$ , $j = 1, \dots, k$ in mixture models. Since we are interested in clustering data arising from longitudinal studies or repeated measurements, every single observation $y_{i}$ consists on a sequence of $m$ measurements $y_{i, t_{1}}, \dots, y_{i, t_{m}}$ , recorded across times $t_{1}, \dots, t_{m}$ each, $t_{j} < t_{j + 1}$ for $j = 1, \dots, m - 1$ . The complete data set is, thus, formed by a sequence of $m$ -dimensional observations $y_{1}, \dots, y_{n}$ .

In the literature, some dependency structure is usually imposed within each observation, for example, a linear model or stochastic process. However, we follow a simpler approach since we think it is enough when the primary task is clustering. Therefore, we use an $m$ -variate distribution for the data model, and for simplicity, we take $t_{l} := l$ . Even more, there is enough distance between measurements to allow for weak time dependence structure for the variable of interest, hence we assume independence among all the recorded points in each observation $y_{i}$ , so a Gaussian distribution is used for modeling each measurement $y_{i, t}$ . The clustering probabilities are determined by joint data points at the same time, and they will increase as their likelihood across all times also increases; this indicates a higher similarity in the whole response.

2.2. Posterior sampling scheme

Augmenting the mixture model in Equation (1) by means of a membership variable, $d_{i}$ , as explained at the beginning of this section, we obtain a hierarchical description of the model useful for sampling

\begin{aligned} y_{i, t} ∣ θ, d & \sim N (y_{i, t} ∣ θ_{d_{i}, t}) [ind], i = 1, \dots, n, t = 1, \dots, m, \\ θ_{j, t} & \sim ν_{0} (θ_{j, t} ∣ ϕ_{t}) [iid], j = 1, \dots, J \\ Pr (d_{i} = j ∣ w_{j}) & = w_{j} (r_{i}, λ) [iid], \end{aligned}

where

J = max (r_{1}, \dots, r_{n})

, and for simplicity

θ = (θ_{1}, \dots, θ_{J})

, and

d = (d_{1}, \dots, d_{n})

in the first line, and

θ_{j} = (θ_{j, 1}, \dots, θ_{j, m})

, and

θ_{j, t} = (μ_{j, t}, τ_{j, t})

. The dependency of the weights

w_{j}

on the latent variables

r

and

λ

has been indicated in the last line (cf. Equations (2) and (3)). At each instant

t

, each observation

y_{i, t}

follows a Gaussian distribution of mean

μ_{j, t}

and variance

1 / τ_{j, t}

. We choose a conjugate prior

ν_{0}

, a Normal-Gamma distribution of parameter

ϕ_{t} = (ω_{t}, c_{t}, a_{t}, b_{t})

, that is

μ_{j, t} ∣ τ_{j, t} \sim N (μ_{j, t} ∣, ω_{t}, c_{t} / τ_{j, t}), τ_{j, t} \sim G (a_{t}, b_{t}) .

The distribution for the membership variables,

d_{i}

, is a discrete uniform, and the prior for variable

r_{i}

is the negative binomial given in Equation (3), for

i = 1 \dots, n

. Finally, parameter

λ

follows a beta prior distribution.

Posterior samples can be obtained from a Gibbs sampler. The mixing weights are updated through their latent variables $r$ and $λ$ , and they also define the updating for the membership variables. The details are provided in the supplementary material. We would only like to highlight that the full conditional distribution for the membership variables is given by

p (d_{i} = j ∣ \dots) \propto g (y_{i} ∣ θ_{j}) = \prod_{t = 1}^{m} N (y_{i, t} ∣ μ_{l, t}, 1 / τ_{l, t}), j = 1, \dots, r_{i},

for

i = 1, \dots, n

. As long as enough density is jointly received for each time instant

t

, these probabilities are sufficient to allocate a full response

y_{i}

in the

j

th cluster.

Furthermore, the samples of these membership variables allow to infer the clustering structure underlying the data. As explained at the beginning of this section, a clustering is a partition $C$ of the data into $k$ nonempty groups $C_{j}$ , $j = 1, \dots, k$ , so $C = {C_{1}, \dots, C_{k}}$ . At each iteration $l$ of the Gibbs sampler, our method generates a clustering $C^{(l)}$ having a random number of clusters $k^{(l)}$ , which equals the number of different membership values $d_{i}^{(l)}$ , $i = 1, \dots, n$ . All these values constitute a sample from the posterior distribution for the random partition $C$ , and we need to provide some point estimate. We will use the posterior modal partition, which will be denoted by $\tilde{C}$ .

For every sampled clustering $C$ , the Gibbs sampler also provides values for the distribution parameters $θ_{j}$ , from which is possible to estimate some functional of interest. In this case, we can estimate a mean profile for each group in $C$ , which summarizes or characterizes observations associated to it. Similarly, probability bands can be computed.

2.3. Merging clusters

Although the posterior modal partition $\tilde{C}$ has a direct probabilistic interpretation, there might be some circumstances, inherent of the specific problem under study, where a simpler clustering structure is desirable. We take advantage of the additional information provided by our model to compute a simplified clustering version of $\tilde{C}$ .

A very widely used distance-based approach for cluster analysis is the hierarchical clustering method. In the literature, there are different measures that can be used to determine the clustering structure evolution, and the experimenter selects one by fixing a cut-off value. Some of these measures are defined in terms of the distance between pairs of data points, and others incorporate a model-based approach. In the latter, each group is modeled by a particular probability distribution, and the cluster hierarchies are built according to the distance between pairs of distributions (see, e.g. Heller and Ghahramani³⁹).

Following this idea, based on a posterior estimate $\hat{C}$ , their estimated densities, say ${\hat{g}}_{j}$ for $j = 1, \dots, \hat{k}$ , are compared by means of the symmetrized Kullback–Leibler (KL) divergence. Small values of this distance, for some pair of densities, is an indicative that their respective groups can be merged into a single one. The KL divergence is defined, for two continuous probability distributions $f_{1}$ and $f_{2}$ , as

D_{KL} (f_{1}, f_{2}) = \int_{- \infty}^{\infty} \log (\frac{f_{1} (x)}{f_{2} (x)}) f_{1} (x) d x .

Its symmetrized version is

KL (f_{1}, f_{2}) = D_{KL} (f_{1}, f_{2}) + D_{KL} (f_{2}, f_{1}),

and can be used to measure the distance between

f_{1}

and

f_{2}

. Thus, we can compute an agglomerative hierarchical clustering version for the estimated groups

{\hat{C}}_{j}

j = 1, \dots, \hat{k}

, through

{\hat{g}}_{j}

. The new clustering structure, say

{\hat{C}}^{*}

, will contain at most

\hat{k}

clusters, which will be one or the union of some of those in

\hat{C}

For the particular mixture model detailed before, each mixture component has an $m$ -variate Gaussian density, $g_{j}$ , of parameters $(μ_{j}, Σ_{j})$ . In particular, the vector mean is $μ_{j} = (μ_{j, 1}, \dots, μ_{j, m})$ and the covariance matrix is diagonal, that is $Σ_{j} = diag (1 / τ_{j, 1}, \dots, 1 / τ_{j, m})$ , for $j = 1, \dots, J$ . In this case, the symmetrized KL divergence has an explicit form; for $j_{1} \neq j_{2}$ ,

KL (g_{j_{1}}, g_{j_{2}}) = \frac{1}{2} \sum_{t = 1}^{m} [\frac{τ_{j_{2}, t}}{τ_{j_{1}, t}} + \frac{τ_{j_{1}, t}}{τ_{j_{2}, t}} + {(μ_{j_{1}, t} - μ_{j_{2}, t})}^{2} (τ_{j_{1}, t} + τ_{j_{2}, t})] - m .

(4)

Its Monte Carlo estimate can be computed from the posterior sample of kernel parameters

(μ_{j}, Σ_{j})

given the posterior modal partition

\tilde{C}

. Finally, the hierarchical clustering scheme can be depicted in a dendrogram and used to determine the simplified clustering

{\hat{C}}^{*}

3. Simulation analysis

We illustrate the performance of our methodology in this section using simulated data. Qin⁴⁰ presents a method for clustering gene expression profiles, and tests it using a simulated data set which can be applied in our context. Let us define five longitudinal profiles through the following functions:

A periodic function given by

y_{i, j} = (k / 2) \sin (j k π / 10 - k π / 4) + ϵ_{i, j};

three profiles will be created by taking

k = 1, 2, 3

A monotone increasing function

y_{i, j} = - 1 + j / 10 + ϵ_{i . j} .

A constant function

y_{i, j} = a_{i, j} + ϵ_{i, j},

where

a_{i, j}

is a continuous uniform random variable between

- 1

and

1

Thus, the data set we use will be conformed by sampling

30

samples from each profile, each one observed along

m = 20

time instants; the sample size is

n = 150

. In all cases,

ϵ_{i, j}

is Gaussian noise, with zero mean and

σ^{2} = 0.3

. Notice that the periodic function with

k = 1

and the constant one produce similar sampled profiles after adding the noise, therefore, it could be harder to correctly cluster them.

The Gibbs sampler is run using the following setup. A sample of size $2000$ iterations is taken after discarding the first batch of $5 000$ as burn in period. For all time instants $t$ , the same hyperparameters are used, that is $ϕ_{t} = (0, 40, 0.1, 0.1)$ for $t = 1, \dots, m$ . For parameter $λ$ , a Beta prior of parameters $(1.1, 1.1)$ is used. Moreover, several values for parameter $s$ are considered to test the effect of the decay structure of the mixing weights; we set $s = 2, 5, 10, 20$ . For each run, the posterior modal partition together with the estimated mean profile and $95$ % credible band for each group are computed.

In Figure 1, we only present the results for the case $s = 5$ since for all cases, similar results were obtained. The supplementary material presents the results for all the cases. Additionally, Figure 2 displays the density-based Silhouette⁴¹ plot, and the dendrogram for simplifying the estimated clustering as explained in the previous section.

Figure 1.

Posterior modal clustering for the simulated data, using $s = 5$ for modulating the weight’s decay structure. The data are presented as continuous black line segments. Each panel represents one estimated group and it includes its estimated mean profile (red line) and its $95$ % credible band (in cyan). The estimated groups exhibit patterns (indicated in each panel caption) close to the data generating models described in the text. (a) Group 1, slightly periodic pattern. (b) Group 2, constant pattern. (c) Group 3, periodic pattern. (d) Group 4, higher periodic pattern. (e) Group 5, increasing trend pattern.

Figure 2.

(a) Density-based Silhouette and (b) resulting dendrogram, using the symmetrized KL divergence, for the estimated clustering of the simulated data. For both panels, groups’ labels correspond to those of Figure 1. The negative values in the Silhouette plot, for Group 2, indicate that their associated observations would be better allocated in some other group. On the other hand, the dendrogram suggests that whether two groups should be merged, Group 1 and 2 would be the firsts candidates.

The density-based Silhouette (DBS) information, a modification of the Silhouette information⁴² for model-based clustering procedures, is a method to evaluate the quality of a particular clustering given a data set. This information is computed for each observation, and a large value indicates an agreement with the assigned group; small or negative values are indicative that such an observation could be better placed in another group.

The results in Figures 1 and 2 show that our method is able to recover the five groups, and their mean profile is close to their corresponding generating data function. The DBS summary shows, in general, a good level of agreement with the estimated groups. As expected, the estimated groups labeled as 1 and 2 seem to be the more problematic; they correspond to the periodic function with $k = 1$ and the constant one, respectively. Furthermore, the dendrogram (Figure 2(b)) suggests that, in case one wants to collapse some groups, these two should be merged in the first place.

4. Identification of groups of adiposity trajectories

Childhood obesity constitutes a worldwide public health problem.⁴³ It is crucial to study the trajectories of adiposity from an early age, it has been identified that adiposity is closely related to what happens in critical periods of development, such as the “first 1000 days” of life, and also with lifestyles during childhood.⁴⁴ Moreover, it has been shown that the patterns of change in adiposity can be heterogeneous over time and vary between children.⁴⁵ There are studies attempting to understand this heterogeneity in order to identify the patterns of change that occur naturally in populations, resulting in the identification of factors that influence an early onset and development of obesity as well as early metabolic disorders. However, most of the existing evidence comes from high-income countries where socioeconomic, cultural and nutritional conditions are different from those of middle- and low-income countries. Even more, in the latter, there are few longitudinal studies considering this kind of analysis from a longitudinal profile perspective.⁷ There is a need to conduct statistical studies allowing to identify groups of individuals at higher risk of obesity, as well as their potentially associated determinants and critical windows during pregnancy, infancy and childhood, in order to guide interventions and strategies that contribute to the prevention and containment of obesity in the population. Given this, the objective of the study was to identify groups of adiposity trajectories using measurements of the body mass index (BMI) $z$ score from six months to ten years of age in a contemporary cohort of Mexican children.

In this longitudinal study, a secondary analysis performed to data in the POSGRAD birth cohort,³⁴ the BMI $z$ score was estimated for $143$ girls and $143$ boys aged $6$ , $12$ , $24$ , $48$ , $60$ , $84$ , $96$ , and $120$ months, who had complete anthropometric data. Adiposity was characterized in terms of the age- and sex-specific BMI $z$ score based on WHO growth standards, expressed as continuous variables. Its estimation was performed using WHO Anthro v. 3.2.2 software.⁴⁶

Children with BMI $z$ scores considered as atypical, that is outside the interval $[- 5, 5]$ , were excluded according to WHO Expert Committee.⁴⁷ Additionally sociodemographic information, mother–child variables, children variables such as: breastfeeding practices, diet and physical activity were available. A more detailed description of the methodology can be found in Ramirez-Silva et al.³⁴

Therefore, the variable of interest $y_{i}$ is the BMI $z$ score at these $m = 8$ instants, and we cluster the longitudinal observations for each bilogical sex. For running the sampling method, hyperparameters related to the weights were fixed as $(α, β) = (1, 1)$ for parameter $λ$ and $s = 5$ for parameters $r_{i}$ ; different values for $s$ were tested, but the results were similar. With regard to kernel hyperparameters $ϕ_{t}$ for each instant $t$ , they were taken as the same, that is $ϕ_{t} = (0, 40, 0.1, 0.1)$ , allowing a big variance. For each data set, the Gibbs sampler was run to obtain a sample of size $5 000$ , after discarding a first batch of $5000$ iterations.

The posterior modal partition ${\tilde{C}}_{g}$ for girls, occurring with probability $0.2184$ , is presented in Figure 3; each panel represents a single cluster. Black lines corresponds to the observations; additionally, the mean profile is displayed in red, and the $95$ % highest probability intervals for the predictive distribution are shown in cyan. The resulting six clusters exhibit different behaviors; however, if we wish to simplify them, we can use the symmetrized KL and compute an agglomerative hierarchical clustering with average linkage. In this case, the resulting dendrogram is shown in Figure 4(a). We can notice, it is possible to reduce the number of clusters, for example to four, by merging the second and third clusters, and the first and sixth ones, leaving the fourth and fifth clusters unchanged. This simplified clustering structure is presented in Figure 5. It is worth mentioning that we also clustered the data using two more methods for Gaussian mixture models: making use of the EM algorithm and Dirichlet process mixtures. In the first case, the mclust package was used.⁴⁸ letting its Mclust function select the optimal fit. For the second case, the pyrichlet package was used⁴⁹ to estimate the posterior modal partition using a Gibbs sampler; here, we set the base measure parameters as in our setting. In both cases, the estimated clustering has only two groups.

Figure 3.

Posterior modal clustering, ${\tilde{C}}_{g}$ , for girls in the BMI data set. Each panel represents one group. Observations are displayed in black, the mean profile in red, and the $95$ % highest probability density interval for the predictive distribution in cyan. (a) Group 1, (b) Group 2, (c) group 3, (d) group 4, (e) group 5 and (f) group 6.

Figure 4.

Resulting dendrogram for the obtained clustering structure for each gender, based on the posterior modal partition, using the symmetrized KL divergence. Labels correspond to those of Figures 3, and 6 accordingly. (a) Girls and (b) Boys.

Figure 5.

Simplified clustering structure for girls based on the posterior modal partition ${\tilde{C}}_{g}$ . Each panel represents a new group. As indicated in the sub-captions, the new groups were obtained by merging some of ${\tilde{C}}_{g}$ (cf. Figure 3). Observations are shown in gray, original mean profiles in black, and the new mean profile in red. (a) Groups 2 and 3 merged; (b) groups 1 and 6 merged; (c) group 4 and (d) group 5.

Our result in Figure 5 identifies that girls in Panel 5(a), from clusters labeled as 2 and 3, who started at six months with a BMI $z$ score closer to the WHO’s median reference pattern, tended to stay closer to the median, or a zero $z$ score, up to age $120$ months. However, for clusters in Panels 5(b) and 5(d), girls who started at six months with BMI $z$ scores larger than the WHO’s reference values, tended to have values further away from the WHO’s reference BMI $z$ score and higher than the reference across time. These girls, at age $120$ , showed average values up to $1.5$ and greater than $2$ standard deviations above the reference pattern. In contrast, girls in Panel 5(c), who at 6 months had a mean BMI $z$ score lower than the reference pattern, that is less than zero, at age $120$ displayed values within one standard deviation under the reference pattern.

In the case of boys, the posterior modal partition ${\tilde{C}}_{b}$ , occurring with probability $0.3014$ , is presented in Figure 6; more groups are estimated. Similarly, to produce a simplified clustering, we can make use of the dendrogram in Figure 4(b), to have only four groups. This new clustering exhibits the same patterns identified for the girls, except for Panel 7d, where unlike the girls, this group was located at $2$ standard deviations from the BMI $z$ -score median reference pattern from age $6$ months and stayed at that level up to age $120$ months. In a similar way, the estimated clustering for this data set using the other methods have only two groups in each case.

Figure 6.

Posterior modal clustering, ${\tilde{C}}_{b}$ , for boys in the BMI data set. Each panel represents one group. Observations are displayed in black, the mean profile in red, and the $95$ % highest probability density interval for the predictive distribution in cyan. (a) Group 1; (b) group 2; (c) group 3; (d) group 4; (e) group 5; (f) group 6; (g) group 7; (h) group 8 and (i) group 9.

Figure 7.

Simplified clustering structure for boys based on the posterior modal partition ${\tilde{C}}_{b}$ . Each panel represents a new group. As indicated in the sub-captions, the new groups were obtained by merging some of ${\tilde{C}}_{b}$ (cf. Figure 6). Observations are shown in gray, original mean profiles in black, and the new mean profile in red. (a) Groups 1, 2, 3 and 8 merged; (b) groups 6 and 7 merged; (c) groups 4 and 5 merged and (d) Group 9.

5. Discussion

Cluster analysis is an extremely common task in applied disciplines. It allows to discover patterns which could help to explain the heterogeneity in some population of interest. We focus in scenarios where the data of interest come from longitudinal studies, often appearing in cohort studies, and have presented a fully Bayesian nonparametric model for identifying profile patterns. Our approach diverges from most frameworks where the Dirichlet process is used, and adopts a class of RPMs whose realizations are such that their weights are decreasingly ordered, producing simple yet effective mixture models for clustering.

Decreasing weight RPMs have been successfully applied for density estimation (cf. De Blasi et al.³³), as well as for other estimation problems, showing a competitive performance, also when compared with more standard nonparametric models like the Dirichlet process. As far as we know, the present work is pioneering in applying this class of RPMs for clustering, also obtaining good results. Moreover, and at least in real data applications, when the estimated clustering contains many groups, some of them can be merged with the help the additional information provided by the methodology; in our case, we utilized the estimated profile densities.

With respect to the modeling of longitudinal data, we also considered an uncomplicated approach avoiding the usage of covariates and any dependency structure for describing the evolution in the measurements in the same individual. The appealing features of decreasing weight RPMs and the simple data model work well together, outperforming more common approaches. As the results in the adiposity application suggest, our approach is able to identify population with differentiated patterns, particularly allowing to pinpoint the higher risk groups. Once identified, it would allow to design specific studies to detect potential factors and events associated with this groups.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802251414594 - Supplemental material for Cluster analysis for longitudinal data and its application in the detection of adiposity trajectories

Supplemental material, sj-pdf-1-smm-10.1177_09622802251414594 for Cluster analysis for longitudinal data and its application in the detection of adiposity trajectories by Asael Fabian Martínez, Ivonne Ramírez-Silva and Ruth Fuentes-García in Statistical Methods in Medical Research

Footnotes

Funding

The authors recieved no financial support for the authorship and/or publication of this article: Asael Fabian Martínez was supported by PEPADI project 12601018. Ruth Fuentes-García would like to acknowledge the support of PAPIIT IN100823, UNAM.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental material

Supplemental material for this article is available online.

ORCID iDs

Asael Fabian Martínez

Ruth Fuentes-García

References

Esrey

Casella

Habicht

J-P

. The use of residuals for longitudinal data analysis: the example of child growth. Am J Epidemiol 1990; 131: 365–372.

Gibbons

Hedeker

DuToit

. Advances in analysis of longitudinal data. Annu Rev Clin Psychol 2010; 6: 79–107.

Herle

Micali

Abdulkadir

, et al. Identifying typical trajectories in longitudinal data: modelling strategies and interpretations. Eur J Epidemiol 2020; 35: 205–222.

Locascio

Atri

. An overview of longitudinal data analysis methods for neurological research. Dement Geriatr Cogn Dis Extra 2011; 1: 330–357.

Arnau

Bono

. Estudios longitudinales de medidas repetidas: Modelos de diseño y análisis. Escritos de Psicología 2008; 2: 32–41.

Andreacchi

Bassim

, et al. Clustering of obesity-related characteristics: a latent class analysis from the Canadian longitudinal study on aging. Prev Med 2021; 153: 106739.

Munthali

Kagura

Lombard

, et al. Early life growth predictors of childhood adiposity trajectories and future risk for obesity: birth to twenty cohort. Child Obes 2017; 13: 384–391.

Anderson

Hafen

Sofrygin

, et al. Comparing predictive abilities of longitudinal child growth models. Stat Med 2019; 38: 3555–3570.

Liu

Rovine

Molenaar

PCM

. Selecting a linear mixed model for longitudinal data: repeated measures analysis of variance, covariance pattern model, and growth curve approaches. Psychol Methods 2012; 17: 15–30.

10.

Reinecke

Seddig

. Growth mixture models in longitudinal research. Adv Stat Anal 2011; 95: 415–434.

11.

Steele

. Multilevel models for longitudinal data. J R Stat Soc A: Stat Soc 2007 Oct; 171: 5–19.

12.

Huggins

Loesch

. On the analysis of mixed longitudinal growth data. Biometrics 1998; 54: 583–595.

13.

Booil

Findling

Wang

, et al. Targeted use of growth mixture modeling: a learning perspective. Statist Med 2017; 36: 671–686.

14.

Kim

Caporaso

, et al. Uncovering circadian rhythms in metabolic longitudinal data: a Bayesian latent class modeling approach. Statist Med 2023; 9: 1–14.

15.

Müller

Rosner

Iorio

, et al. A nonparametric Bayesian model for inference in related longitudinal studies. J R Stat Soc C: Appl Stat 2005; 54: 611–626.

16.

Gaskins

Fuentes

De La Cruz

. A Bayesian nonparametric model for classification of longitudinal profiles. Biostatistics 2023; 24: 209–225.

17.

Jiao

Liang

Wang

, et al. Longitudinal data analysis based on Bayesian semiparametric method. Axioms 2023; 12: 431.

18.

McNicholas

Murphy

. Model-based clustering of longitudinal data. Can J Stat 2010; 38: 153–168.

19.

Quintana

Johnson

Waetjen

, et al. Bayesian nonparametric longitudinal data analysis. J Am Stat Assoc 2016; 111: 1168–1181.

20.

Adhikari

Lecci

Becker

, et al. High-dimensional longitudinal classification with the multinomial fused lasso. Stat Med 2019; 38: 2184–2205.

21.

Stoitsas

Bahulikar

de Munter

, et al. Clustering of trauma patients based on longitudinal data and the application of machine learning to predict recovery. Sci Rep 2022; 12: 16990.

22.

Wang

. Efficient classification for longitudinal data. Comput Stat Data Anal 2014; 78: 119–134.

23.

Koo

Kim

. Bayesian nonparametric latent class model for longitudinal data. Stat Methods Med Res 2020; 29: 3381–3395.

24.

Chiou

J-M

P-L

. Functional clustering and identifying substructures of longitudinal data. J R S Soc B: Stat Methodol 2007; 69: 679–699.

25.

Proust-Lima

Séne

Taylor

, et al. Joint latent class models for longitudinal and time-to-event data: a review. Stat Methods Med Res 2014; 23: 74–90.

26.

De la Cruz-Mesía

Quintana

Marshall

. Model-based clustering for longitudinal data. Comput Stat Data Anal 2008; 52: 1441–1457.

27.

De la Cruz-Mesía

Quintana

Müller

. Semiparametric Bayesian classification with longitudinal markers. J R Stat Soc C: Appl Stat 2007; 56: 119–137.

28.

Ferguson

. A Bayesian analysis of some nonparametric problems. Ann Stat 1973; 1: 209–230.

29.

Fuentes-García

Mena

Walker

. A new Bayesian nonparametric mixture model. Commun Stat - Simul Comput 2010; 39: 669–682.

30.

Fuentes-García

Mena

Walker

. A nonparametric dependent process for Bayesian regression. Stat Probab Lett 2009; 79: 1112–1119.

31.

Mena

Ruggiero

Walker

. Geometric stick-breaking processes for continuous-time Bayesian nonparametric modeling. J Stat Plan Inference 2011; 141: 3217–3230.

32.

Gutiérrez

Gutiérrez-Peña

Mena

. Bayesian nonparametric classification for spectroscopy data. Comput Stat Data Anal 2014; 78: 56–68.

33.

De Blasi

Martínez

Mena

, et al. On the inferential implications of decreasing weight structures in mixture models. Comput Stat Data Anal 2020; 147: 106940.

34.

Ramirez-Silva

Rivera

Trejo-Valdivia

, et al. Relative weight gain through age 4 years is associated with increased adiposity, and higher blood pressure and insulinemia at 4–5 years of age in Mexican children. J Nutr 2018; 148: 1135–1143.

35.

Perman

Pitman

Yor

. Size-biased sampling of poisson point processes and excursions. Probab Theory Relat Fields 1992; 92: 21–39.

36.

Lijoi

Mena

Prünster

. Controlling the reinforcement in Bayesian non-parametric mixture models. J R Stat Soc B: Stat Methodol 2007; 69: 715–740.

37.

De Blasi

Favaro

Lijoi

, et al.

Are Gibbs-type priors the most natural generalization of the Dirichlet process?

IEEE Trans Pattern Anal Mach Intell 2015; 37: 212–229.

38.

Gil-Leyva

Mena

. Stick-breaking processes with exchangeable length variables. J Am Stat Assoc 2023; 118: 537–550.

39.

Heller

Ghahramani

. Bayesian hierarchical clustering. In Proceedings of the 22nd International Conference on Machine Learning, 2005, pp. 297–304.

40.

Qin

. Clustering microarray gene expression data using weighted Chinese restaurant process. Bioinformatics 2006; 22: 1988–1997.

41.

Menardi

. Density-based silhouette diagnostics for clustering methods. Stat Comput 2011; 21: 295–308.

42.

Rousseeuw

. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987; 20: 53–65.

43.

Weng

Redsell

Swift

, et al. Systematic review and meta-analyses of risk factors for childhood overweight identifiable during infancy. Arch Dis Child 2012; 97: 1019–1026.

44.

Huang

R-C

de Klerk

Smith

, et al. Lifecourse childhood adiposity trajectories associated with adolescent insulin resistance. Diabetes Care 2011; 34: 1019–1025.

45.

Garden

Marks

Simpson

, et al. Body mass index (BMI) trajectories from birth to 11.5 years: relation to early life food intake. Nutrients 2012; 4: 1382–1398.

46.

WHO. WHO Anthro for personal computers, version 3.2.2, 2011: software for assessing growth and development of the world’s children, 2010.

47.

WHO Expert Committee. Physical status: the use and interpretation of anthropometry. Technical Report, 1995. WHO.

48.

Scrucca

Fraley

Murphy

, et al. Model-Based Clustering, Classification, and Density Estimation Using mclust in R. New York: Chapman and Hall/CRC, 2023.

49.

Selva

Fuentes-García

Gil-Leyva

. pyrichlet: a python package for density estimation and clustering using gaussian mixture models. J Stat Softw 2025; 112: 1–39.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.38 MB