Testing group differences in compositional data, that is, multivariate data referring to parts of a whole, requires focussing on the relative information between components. This is commonly achieved by mapping the data into a sensible logratio coordinate system. Groupings are often defined by an externally given factor but can also emerge from internal features of the data, such as distinct zero patterns, which may reflect an underlaying structure of subpopulations. This work introduces the PERLOG test, a novel non-parametric permutation test to identify significant groupings based on pairwise logratios, the fundamental units of compositional information. The method is suitable for both externally and internally defined groupings. In particular, the case of groups defined according to zero patterns is discussed as a prominent example of the latter. The performance of the proposal as a statistical test and its advantages over conventional multivariate tests are demonstrated through simulation. Real-world applications are illustrated using data from studies on movement behaviours and time-use epidemiology.
Multivariate data representing relative parts of a whole are known as compositional data (CoDa).1 Common examples include chemical mixtures, nutritional compositions, time allocations across daily activities, electoral vote shares, and so on. Although not strictly necessary for their analysis, such data are often rescaled into proportions, percentages, parts per million (ppm) or equivalent relative units. This underscores the relative nature of the data and the fact that the scale used must be irrelevant for analysis and modelling.
Formally, a composition represents an equivalence class2 of all -part vectors in the positive orthant of the -dimensional real space, which can be obtained as from a given vector , where is any positive scaling constant. Every member of the class carries the same relative information and, hence, given any two compositions and , it holds that the ratios and are the same for any pair of parts and . When CoDa are closed to sum to a constant, typically 1 (for proportions) or 100 (for percentages), they are (equivalently) represented within a simplex as sample space. This prompted the development of the logratio methodology for the analysis of CoDa,1 which maps compositions into ordinary real space via logratio transformations or coordinates. This facilitates data analysis and modelling based on the relative information between parts but using ordinary methods.
Given a multivariate data set of size and a predefined grouping factor (i.e. a supervised learning setting in machine learning terminology), the formal assessment of whether the observed differences between groups exceed what would be expected by chance relies on statistical inference tools. Focussing on the ordinary case of differences in location, two feasibly available in software packages and widely used methods are multivariate analysis of variance (MANOVA) and permutational analysis of variance (PERMANOVA). Both aim to test for group differences between mean vectors by jointly considering variables and their interrelationships. MANOVA is a parametric statistical test based on an approximate distribution, which is applicable to regular data sets (i.e. where is larger than ) and assumes multivariate normality and homoscedasticity.3 PERMANOVA is a semiparametric counterpart that operates on dissimilarity measures between groups. It uses a pseudo as test statistic and significance is assessed via permutational algorithms,4 without requiring stringent distributional assumptions and being applicable on wide data sets ().
The application of MANOVA to CoDa has been discussed in detail previously,5 with the formulation based on so-called isometric or orthonormal logratio (ilr/olr) coordinates. However, equivalent results can be obtained using other common logratio systems, such as additive logratios (alr) and centred logratios (clr) (with the caveat that, in the clr case, one part must be dropped beforehand because of the singularity of clr variables). Regarding PERMANOVA, although we found no formal treatment of the method for CoDa in the literature, it has been applied in its standard form in practice using both clr and olr coordinates.6–9 Note that unlike the alr representation, both clr and olr are isometric, meaning that they preserve ordinary distances between compositions and lead to the same results.10,11 This makes them theoretically sound for use with Euclidean distance-based methods in real space.
In addition to cases where the grouping structure is defined by an external factor, groupings may also be given by some internal characteristics of the data. A common example in the context of CoDa sets is the presence of zero patterns. That is, groups of individuals sharing some parts for which the observed value is zero. These lie on the boundaries of the sample space, and the logratios involving zero parts become mathematically undefined. In practice, zero values in CoDa are often related to rounding errors, detection limits of the measuring devices (e.g. when measuring chemical concentrations) or under-reporting due to limited sampling. In these cases, zeros can be interpreted as a form of data censoring, and imputation is a valid approach to replace them with reasonably small values.12–14 However, in some cases, zeros can refer to parts that are truly absent for an individual, so-called structural or essential zeros in the CoDa literature.1,15 Examples include political candidates receiving no votes in a region, individuals spending no money on a specific commodity, say smokers versus non-smokers, or individuals not engaging in a certain daily activity. In such situations, imputation may be regarded unrealistic (although it can still serve as a practical workaround, especially when zeros are rare in the data set). Thus, they can be considered a valid outcome from a joint random process generating zero and non-zero parts, in which case they should ideally be incorporated into the modelling.16,17 Or alternatively, interpreting them as indicators of subpopulations, it may be more appropriate to conduct separate analyses on the subcompositions defined by the zero patterns.
In this work, we introduce statistical tools to support decisions about the influence of external or internal factors taking as reference all pairwise logratios between compositional parts, the most elemental pieces of information in CoDa. The cornerstone is a non-parametric permutation test called PERLOG (PERmutation-based pairwise LOgratio Group) which is applicable to both externally and internally defined groupings under mild conditions. The method allows the exploration of group differences in mean at multiple nested levels, thus offering a comprehensive view of variability sources and key contributors. For internally defined groups, we focus on the zero problem given its practical relevance in CoDa analysis. The performance of the PERLOG test, both as a statistical test and in comparison with existing approaches, is formally evaluated through simulation. Finally, its practical utility is illustrated in two real-world case studies on movement behaviours and time-use epidemiology.
Preliminary background
A CoDa observation conveys relative information in the ratios between parts. When practitioners decide to analyse a data set using compositional methods, they are implicitly assuming that the information contained in any observation is the same as in , for any real scalar . Based on this property, known as scale invariance,1 a composition can be defined as an equivalence class.2 According to such definition, the general expression of a logcontrast is a scale-invariant logratio1
A logcontrast defines a logratio between parts depending on the values of the coefficients , with . Thus, for the corresponding part appears in the numerator, for it appears in the denominator, and for those parts that do not contribute to the logratio. Using these, it is possible to define new real-valued variables where the relative information contained in a composition is somehow combined or aggregated. This is the case of the clr transformation,1 defined as
That is, clr variables are obtained by dividing each part by the geometric mean of all the parts, and taking logs. Note that (2) corresponds to setting the coefficients of the logcontrast (1) to and for , with . It holds that , which means that the clr variables lie in a hyperplane of the -dimensional real space, limiting their applicability in statistical modelling (e.g. they lead to a singular covariance matrix if a multivariate normal distribution is assumed). However, basic geometric concepts such as inner product, distance, and norm can be perfectly defined for CoDa via clr variables.18
Building on these metric elements, one can define orthonormal bases in clr space, from which olr coordinates are derived.19,20 While both clr and olr representations can be used interchangeably in metric-based methods such as distance-based clustering,10 olr coordinates lie in the real space of dimension , matching the true dimensionality of a -part compositions, thus facilitating statistical modelling. A common form of olr coordinates, known as balances, is constructed via a sequential binary partition (SBP) of the parts of into non-overlapping subsets.21 Thus, at the th step of such partition, an olr coordinate is given by
where and denote the numbers of parts entering, respectively, the numerator (superscripted ) and denominator (superscripted ) of the logratio term of the expression. That is, compositional balances represent contrasts between the geometric means of mutually exclusive subsets of parts of the composition. A specific SBP scheme leads to so-called pivot coordinates,22 where the th element is given by
Here, the first pivot coordinate captures all the relative information about any part placed on the first position of the composition, . Such pivot coordinate is proportional to the balance between and the geometric mean of all the other parts and, also, to the first clr variable in (2), namely .
Moreover, both clr variables (2) and balances (3) can be rewritten as normalised aggregations of pairwise logratios. In the first case, adding all pairwise logratios of a given part to the others as
In the second case, adding all pairwise logratios between parts in the numerator and denominator as
These relationships point out that commonly used logratio representations are essentially aggregating the most basic information contained in the set of pairwise logratios between parts of the composition, with . Namely, given a -part composition, pairwise logratios are required (up to reciprocals) to collect such information.
These elemental pairwise logratios are the basis for the PERLOG test introduced in the following section.
Assessing differences between groups from pairwise logratios
Here, we provide details of the PERLOG test, a novel statistical procedure used to test for significant differences between predefined groupings in a CoDa set. The procedure is classified as non-parametric because no distributional assumptions are made regarding the process generating the data. In particular, it belongs to the family of so-called permutation tests,23 where the significance of the observed difference between groups, in terms of means, is assessed by comparing it to the empirical distribution of differences generated by repeatedly and randomly shuffling the observations between groups. This formally sets the distribution of the test statistic under the null hypothesis of no differences, requiring only that the group labels be interchangeable, from which a -value, based on how extreme the observed difference is, is estimated to draw a conclusion as usually. The following subsections distinguish the cases of externally and internally defined groupings.
Groups defined by an external factor
Let us consider a CoDa set which includes subsets of observations arising from a number of predefined groups, according to a given factor variable. First, considering all possible pairs of groups and all pairwise logratios, an elemental measure of dissimilarity is computed. This will be denoted by , referring to the dissimilarity in the logratio between the th and th groups, with , , , and . These elemental dissimilarities are eventually aggregated into an overall dissimilarity measure which is used as input for the permutation test. The details of the entire procedure are described below.
We propose such a dissimilarity measure to be based on the Welch’s -statistic24 which, using analogous notation, is written as
with and , and the corresponding degrees of freedom being computed as
For the th and th groups, and refer to the number of observations in each, whereas and and and are, respectively, the sample means and variances of the pairwise logratios , with . Thus, and are here the corresponding squared standard errors of the sample logratio means.
Note that (7) then draws on the elements of the most basic statistical summary for CoDa given by the so-called variation array, originally introduced in the seminal monograph by J. Aitchison,1 consisting of a matrix containing the parwise logratio means in the lower triangle and the pairwise logratio variances in the upper triangle. Importantly, the variation array of any subcomposition is simply obtained by selecting all the logratio variances and means associated with the parts of such subcomposition. This is particularly relevant when analysing groupings defined by an internal factor as discussed in Section 3.2.
To make the information provided by the values (7) comparable regardless of the number of observations and the sample variance across all pairwise group comparisons and logratios, a mapping is applied to represent them as normalised standard normal quantiles. Thus, firstly, the distribution function of the Student’s with degrees of freedom is evaluated at . The resulting value, a probability denoted by , is then used to obtain an associated -quantile of the standard normal distribution, , by the inverse distribution function. Finally, the elemental dissimilarity measures are computed as
In simple terms, this dissimilarity metric compares an empirical quantile, , with a standard normal quantile, , which serves as a reference point. For instance, when using the ordinary significance level , the latter is . If two groups and are completely comparable, then it holds that , and hence, . That is, there is no dissimilarity between the groups. Otherwise, the larger the value , the larger the dissimilarity between groups.
Including a reference term such as in (8) is useful because an above 1 indicates a prominent role of the associated logratio in differentiating between groups, whereas a value below 1 implies the opposite. This distinction allows accounting for their relevance on a common basis, in combination with the parameter used in the definition of the overall dissimilarity measure . Namely, for a given data set, the elemental dissimilarities are accumulated into
Note that if there are no differences between groups, then is equal to zero. Moreover, choosing in (9) amplifies the influence of the largest and simultaneously downplays the influence of the smallest ones. This effect is illustrated in Figure 1.
Illustration of the effect of the power parameter on the weight of the elemental dissimilarity measure in the overall dissimilarity statistic.
Therefore, is the statistic embedded in the permutation test, which is repeatedly computed across a given number of permutations of the observations (entire rows while keeping the order of the groups) of the original data set, typically doing at least . The corresponding -value is determined by the proportion of times the statistic is greater for the permuted data sets than for the original data set. Finally, the ordinary decision rule is applied: if the estimated -value is lower than a prefixed significance level , then it is concluded that at least one group significantly differs in mean from the others.
On a practical note, if the difference between groups is considerable, a statistical computing system might return a value for rounded to 1, leading to an infinite value for . For example, the highest possible value for this quantile in the R system for statistical computing,25 with default precision, is 8.210 (rounded to three decimal places). The issue can be circumvented by replacing any infinity obtained with this maximum value. In such a case, the highest value of is 4.189 (rounded to three decimal places) (see this situation illustrated in Section 5).
Furthermore, when the result of the above permutation test suggests a significant overall difference between groups, the elemental dissimilarities (from both the original and permuted datasets) can be used to conduct a post hoc analysis. That is, it is possible to investigate which pairs of groups differ from one another, as well as which logratios are involved. Thus, for each pair of groups, the dissimilarity
computed from the original data set is compared with those computed from the permuted data sets, from which associated p-values are obtained. Note that these p-values require some form of adjustment for multiple comparisons.26 If meaningful differences are suggested for pairs , then individual dissimilarity measures can be subsequently compared between original and permuted data sets to analogously explore which individual pairwise logratios govern such a difference (again, subject to adjustment for multiple comparisons). It is important to note that the value of the parameter plays no role when comparing at such individual logratio level. Finally, the contribution of individual pairwise logratios or groups to the differences can be easily quantified by the ratios , and .
Groups defined by an internal factor
We now consider a situation in which a grouping structure originates from certain features of the data set. This involves additional complexities because, unlike the external factor setting, it is necessary to deal with varying subcompositions by group.
As stressed before, focussing on the presence of zero patterns as an example of an internal factor that potentially defines subpopulations in a CoDa set is of main practical relevance. We have emphasised this for the structural zero case, but note that the test can also be useful in the case of zeros due to rounding-off or censoring thresholds. Thus, zero imputation methods generally assume that the data are drawn from a single population, with common mean and covariance structure throughout. Then, the test can be useful for investigating such an assumption. In addition, the procedure can be analogously applied to patterns defined by missing entries or any other internal factor of interest in the subject area.
Taking the framework elaborated in the previous subsection as reference, the overall dissimilarity measure in (9) for the original data set is calculated from the available elemental dissimilarities . This is illustrated by an example in Table 1. As noted before, the required logratio means and variances of the subcompositions associated to each group are extracted from the corresponding partitions of the variation array.
Exemplary case of a data set resulting from a 5-part composition including four different zero patterns.
The permuted data sets are generated similarly to the case of an external factor, where entire observations are randomly rearranged while preserving the original group structure. Elemental dissimilarities are then computed using available logratios within each group. In some cases, logratio values may be available for every combination and but, for the computation of the overall dissimilarity, only the elemental dissimilarities with the same indices as in the original dataset are considered. Moreover, to ensure a sufficient number of logratio values for minimally reliable calculations, which in practice can be particularly difficult to control in the case of internal factors, we require that at least values are present in both groups and for each required combination . This serves as a practical rule of thumb, inspired by Mead’s resource equation for determining the minimum group sizes to compare two means in the absence of any other information.27
Performance assessment by simulation
Case of an external factor
The PERLOG test is assessed in terms of type I error and power (i.e. type II error) as main performance metrics. Briefly, data from a -part composition are generated by randomly sampling vectors of pivot coordinates (4) from a multivariate normal distribution for each group . They are then back-transformed through inverse olr transformation and combined into a CoDa set of size , with .
The range of scenarios considered results from combining the following parameter settings: (a) size of the composition and number of groups ; (b) group sizes and ; (c) tuning parameter ; (d) group variabilities determined by , , and covariance matrix (form inspired by estimates from the example data sets); and (e) group means and , with when evaluating type I errors and when evaluating type II errors. That is, the parameter relates to the effect size, the difference in group locations. The use of pivot coordinates facilitates here implementing the case where the difference between groups is led by the relative dominance of one part against the others. The position of in the vector of pivot coordinates is chosen randomly in each simulation run. Thus, to estimate the type II error, situations are considered in which the first group differs from the others in terms of the distance between and , which is determined by . The performance of the PERLOG test is compared with ordinary MANOVA (based on the Pillai’s trace statistic) and PERMANOVA tests. These are applied with default settings on the data expressed on pivot olr coordinates. Hence, both MANOVA and PERMANOVA are applied here as they are conventionally defined and implemented in widely used software packages (in this study using, respectively, the manova function in the R package stats and the adonis function in the R package vegan), thereby reflecting their standard usage among practitioners.
A total of simulations are run for each scenario when evaluating type I error, which is estimated as the proportion of -values obtained from the test that are smaller than the given significance level (i.e. the proportion of false positive results). If the test works properly, the estimated type I errors should be approximately or, more precisely, the corresponding confidence interval should include . Using the Wilson’s score confidence interval as reference,28 acceptable type I error estimates fall in the range . Moreover, the assessment of type II errors is based on simulations for each scenario. This is estimated as the proportion of -values greater than (i.e. the proportion of false negative results).
The results of this simulation study are summarised in Appendix A. Figure 7 shows results for type I errors for different configurations of parameter settings. It can be observed that the PERLOG test basically produces results within the expected range across all scenarios. In general, no pattern according to the value of parameter is obvious here. As to statistical power (see Figures 8, 9 and 10 for size of the composition equal to 5, 10 and 20 respectively), it increases as increases. Although such a gain generally becomes negligible for larger values of , depending on the size of the composition (larger tends to require larger for the results to stabilise). Beyond this, the shape of the power curves aligns with the expected shape for a statistical test. Thus, larger group sizes favour a faster approach to the highest power levels as , which determines the difference between the groups, increases. This appears to be modulated to some extent by the size of the composition, with a larger being generally associated with slower convergence. However, note that this effect is not specific to the method, but generally relates to the curse of dimensionality. Moreover, higher within-group variability, as determined by the parameter , negatively affects power, as expected.
When compared with PERMANOVA and MANOVA, the PERLOG test consistently outperforms both in terms of type I error rate in scenarios with unequal variances and sample sizes (Figure 7). PERMANOVA and MANOVA tend to be overly conservative when the logratios from the smaller group exhibit smaller variances (i.e. ), while it becomes overly liberal when the logratios from the smaller group have larger variances (i.e. ). Regarding power (Figures 8, 9 and 10), focussing on PERLOG with (the value for which the results stabilise across all scenarios) and excluding the cases where the two alternatives are too liberal, we can observe that PERMANOVA generally exhibits the worst rates. As for MANOVA, its results are largely comparable to those of PERLOG, although it tends to reach less statistical power than PERLOG with increasing .
Note that some Welch-type MANOVA analogues and modifications of PERMANOVA have been proposed in the statistical literature to address heterogeneous group dispersion structures for ordinary (non-compositional) multivariate data. For instance, the Wald-type statistic (WTS) and a modified ANOVA-type statistic (MATS) for MANOVA were recently introduced.29 We conducted a preliminary evaluation of these alternatives and found that MATS depends on the choice of logratio representation, making it unsuitable for CoDa. In contrast, the WTS is not affected by that; but, similarly to standard MANOVA, it cannot be applied directly on clr variables due to the singularity of the corresponding covariance matrix. Moreover, in own testing running WTS-based MANOVA (on pivot olr coordinates) though a few scenarios considering 5-part and 10-part compositions, we found it leading to markedly inflated type I errors (reaching rates around 0.5 in the case of 10-part compositions). These findings are, in fact, consistent with the literature indicating that this test is highly liberal.30 Regarding PERMANOVA, the related literature emphasises that the method is inherently robust to heterogeneity in dispersion, particularly when groups sizes are balanced, and remains robust under moderate departures from dispersion homogeneity.31 Our study indicates that PERMANOVA generally maintains acceptable type I error rates, although it performs particularly poorly in scenarios involving heterogeneous group covariances and unbalanced groups, similarly to MANOVA. A modification of the pseudo statistics to account for heterogeneity in unbalances designs has been proposed.32 Apart from the above, to the best of our knowledge, these variants have neither been formulated for nor applied to CoDa so far, and they remain largely absent from mainstream software implementations and routine practice, limiting their practical applicability in the context of the current study.
Case of an internal factor: Zero patterns
Given that ordinary MANOVA and PERMANOVA are not applicable in this situation, the results in this section refer only to the performance of the PERLOG test (allowing for varying values of the tuning parameter ).
First, artificial data are generated following the same procedure as in Section 4.1, although considering a reduced number of scenarios with more relaxed parameter settings. A wider range of group sizes are allowed by sampling from a uniform distribution , with , and from . Group is set as a zero-free pattern, with this ordinarily being the largest group in practice. For the covariance structure, either the same is set across groups, with , or varying values are considered by sampling them from . The minimal number of groups considered is (note that would lead to the analysis of subcompositions omitting the zero parts).
Subsequently, zeros patterns are forced randomly but ensuring the following properties: (a) there are indeed different zero patterns; (b) each compositional part has non-zero values in at least two groups (otherwise subcompositions without the part in question could be analysed); (c) one pattern (corresponding to the largest group) represents the zero-free case; (d) one pattern has zeros in just one part; (e) one pattern has zeros in parts, where is randomly chosen from ; (f) the remaining patterns can have zeros only in parts chosen to be zero in points (d) or (e), that is no more than parts contain zeros; and (g) when assessing type II error rates, the group representing a subpopulation has non-zero values in the part , where the index corresponds to the position of in the vector (i.e. there are no zeros in the part leading the contribution to group distinctiveness).
The results of this simulation study are summarised in Appendix B. Figure 11 shows that the PERLOG test produces type I error rates within the expected range for all parameter settings. No general trend is observed regarding the tuning parameter . As for statistical power, Figure 12 displays the expected behaviour, showing an increase as the parameter , controlling the difference between centres, rises. Meanwhile, parameter does not appear to have any meaningful effect beyond in general. No differences are evident according to the number of groups or the size of the composition . Moreover, as expected, the lowest power rates are generally reached when the parameter , which affects the group sizes, is set to a lower value of 25.
In summary, the trends observed in this simulation study are similar to those in the external factor case, although the impact of parameters and is less pronounced. Moreover, it is worth noting that the results for homogeneous groups are largely comparable to those observed for heterogeneous groups, reflecting on the consistency of the performance of the method in these two settings.
Illustrative applications
A core principle of movement behaviours and time-use epidemiology is that how we allocate our time influences our health. Traditionally, research in this field focussed on the relationship between a single time-use behaviour (such as screen time) and a health outcome. However, this approach overlooked the fact that time-use behaviours are inherently interdependent. Since time is limited to 24 hours per day, spending more daily time on one activity necessarily reduces the time available for others. These trade-offs can themselves influence health outcomes. As such, time-use data are compositional in nature and should be analysed and interpreted in ways that reflect their relative structure33,34 In recent years, this has become feasible through the increasing adoption and adaption of the CoDa methodology to tackle common research questions in this field.35–41
Movement behaviours in relation to age groups
In this first illustrative case study, we investigate differences in movement behaviours according to age group among Czech schoolgirls. The data were obtained from the study conducted in Štefelová et al.,42 which focussed on the relationships between daily movement behaviours and adiposity. The raw activity data were collected using a wrist-worn accelerometer. They were then categorised into time spent in sedentary behaviour (SB) and light, moderate, and vigorous physical activity (LPA, MPA and VPA) based on ordinary intensity cut-off points established in the subject area.43 Sleep time was determined using algorithms fed from sleep logs.44 Hence, the data set used here consists of a fully observed 5-part composition measured for individuals. Moreover, age categories were considered: , and years; with corresponding group sizes of , and respectively. Age categories then play the role of external factor in this application.
Table 2 shows the compositional mean vector by age group. This is computed as the vector of geometric means for each part, , , which has been subsequently closed to be expressed in min/day adding up to 1440 to ease interpretability in the application context. Note that these geometric means are computed over the observations for each group . This result indicates that schoolgirls tend to spend relatively more time in SB at the expense of other behaviours as they age. Moreover, Figure 2 displays histograms of the distributions of all pairwise logratios within each age group. It can be observed that all logratios involving exhibit the same trend, that is, the histograms appear analogously ordered according to age group. The opposite trend is observed for the logratios involving , suggesting that the logratio is a main direction of variation in this data set. This is confirmed in Figure 3, where an ordinary compositional biplot of the data set is displayed. The compositional biplot45 provides a low-dimensional representation of the structure of relative variation of the data, with the points corresponding to the observations and arrows from the origin to the clr transformed parts of the composition, that is for (see equation (2)). In brief, it is built from a principal component analysis (PCA) of the clr data set, with the first two PCs (PC1 and PC2, which explain the highest fraction of the total variability of the data) commonly used to project the data in two dimensions for easy visualisation. The squared lengths of the arrows approximate the variances of the clr variables, , and the sum of these latter coincides with the total variability of the data set. Hence, the relative contribution of each part to the total variability can be computed: 66.61% VPA, 15.30% SB, 7.75% sleep, 5.48% LPA and 4.86% MPA. Moreover, the squared lengths of the links between arrowheads approximate the logratio variances, , between any two parts and represented by them.1,45Figure 3 accounts for all this information, with the longest link observed between VPA on the left-hand side and SB on the right-hand side, and the age groups mostly distinguished along the PC1 axis (90.44% of the total variability explained).
Histograms of the distribution of pairwise logratios of movement behaviours of Czech schoolgirls study according to age groups.
Compositional biplot of the Czech schoolgirl movement behaviour data set (97.07% total variability explained by the first two principal components, PC1 and PC2).
Compositional centre of schoolgirl daily movement behaviours (in min/day by age group in years).
Sleep
SB
LPA
MPA
VPA
8–11
11–15
15–19
FDR-adjusted -values from PERLOG test comparisons at zero pattern and pairwise logratio levels for the adolescent sedentary behaviour data set.
After the exploratory analysis above, the PERLOG test is applied to the data set, performing permutations of the rows and setting the tuning parameter to 2. As anticipated, a statistically significant difference between groups is suggested at the usual 5% level (overall p-value ). Further post hoc analysis to test for differences between each pair of groups (three pairwise comparisons) also leads to p-values in all cases, after adjustment for false discovery rate (FDR) in multiple comparisons using the Benjamini and Hochberg’s method.46 Hence, according to these results, each group significantly differed in mean from the others. Note that using higher values of , up to as considered in the previous section, led to equivalent results, yielding -values approximately zero in all cases. Looking at the individual logratios across groups, all tests showed significant differences (FDR-adjusted p-values ), except for the logratio (, and for the comparisons between, respectively, age groups and 11–15, 11–15 and 15–19, 8–11 and 15–19). The direction of the difference can be learnt from the signs of the individual Welch’s -statistics (7). These, along with other numerical results, are summarised in Table 4 of Appendix C.
Note that Table 4 lists several elemental dissimilarities equal to . As discussed in Section 3.1, this is a consequence of achieving the maximum precision of the R system used to run the procedure. Therefore, the information about relative contributions at different levels is to some extent distorted here, but some conclusions can still be drawn. Thus, for instance, comparing adjacent age categories, Group 11–15 appears to be more similar to Group 8–11 than to Group 15–19 (with relative contributions to the overall dissimilarity of the respective differences between pairs being approximately and respectively). At the pairwise logratio level, Table 4 also shows that the two younger age groups differ primarily in the logratio of sleep to SB. Additionally, logratios involving VPA, followed by those with SB, are the main contributors to making the oldest age group the most distinguishable.
In summary, the analysis conducted supports the idea that the level of daily physical activity tends to decrease with age in Czech schoolgirls. In fact, each logratio of a movement behaviour to a less physically intense one is significantly higher when shifting into younger age groups, except for the logratio of LPA to sleep, which remains fairly constant across age groups.
Sedentary behaviours according to zero patterns
The second illustrative application comprises groups given by zero patterns in the data set as an internal factor. We used a reduced sample of 513 observations extracted from a cross-sectional study, conducted in Belgium for approximately one year between 2013 and 2014, to determine correlates of context-specific sedentary behaviours in the population.47 The data refer to the time spent in sedentary behaviours by adolescents (12–18 years old) on weekends, and they were collected using a designed time diary. The measured composition consists of five parts: sitting time while doing schoolwork (Sch); watching television, using computer for leisure or gaming (Scr, standing for screen time); reading (Rea); socialising (Soc); and doing other activities (Oth).
Figure 4 summarises the four main zero patterns found in this data set (ordered decreasingly by rows according to the relative frequency of the patterns across individuals and by columns according to the percentage of zeros in each compositional part). Note that 10 additional zero patterns were observed in the original data set, however, their relative frequency was very low, and they mostly corresponded with meaningless configurations in practice (e.g. 9 individuals reported no time spent in all categories but Scr). These samples were omitted from the analysis as part of the data preprocessing stage to simplify the illustration here. Thus, the final data set involved information from individuals. As can be seen, the four patterns are determined by the possible combinations of presence and absence of the reading and social time categories, with the reading category containing 47.42% zeros, and the social category containing 15.96% zeros. Accordingly, we can in practice refer to the largest group (zero pattern #1; 45.17% of individuals) as social readers, to the second largest group (zero pattern #3; 38.88%) as social non-readers, to the subsequent one in size (zero pattern #4; 8.54 %) as non-social non-readers and to the smallest one (zero pattern #2; 7.42 %) as non-social readers. The values in the cells are the geometric means of the available parts closed to 100 for each zero pattern, that is expressed in percentages. Recall that the relevant information is in the ratios between parts and, hence, these should be the focus of the analysis for it to be consistent and preserved regardless of the scale of the data. Thus, Figure 5 provides some initial insights into defining the differences between the groups. Some deviation is apparent in relation to pattern #2 (non-social readers), with individuals who tend to spend more time doing schoolwork relative to other behaviours.
Main zero patterns in the adolescent sedentary behaviour data set. The column bars refer to the percentage of zeros by compositional part and the row bars refer to the percentage of observations within each pattern. The numbers in the cells correspond to the closed geometric means of the subcomposition available by zero pattern. Notation: Rea (reading), Soc (social), Sch (schoolwork), Scr (screen time) and Oth (others).
Boxplots of pairwise logratios for the adolescent sedentary behaviour data set according to zero pattern. Notation: Rea (reading), Soc (social), Sch (schoolwork), Scr (screen time) and Oth (others).
The results of the PERLOG test (setting again and ) support the above discussion. First, significant overall differences between groups are concluded (-value ). When looking at the individual comparisons between groups, pattern #2 is confirmed as the deviating group (see Table 3a). Lastly, testing at the pairwise logratio reveals that this is mainly driven by the relatively large amount of time invested in schoolwork by individuals in this group (see Table 3b). As to the tuning parameter , the simulation study reported before suggests that the results are generally expected to be fairly comparable for values from 2 on, particularly in relation to type I error and in the case of small to moderate compositions. For illustration, Table 5 in Appendix C shows the results of a sensitivity analysis with this regard. It can be observed that the results in terms of test -values to assess the significance of overall and group comparisons are very much equivalent across the values of the parameter . Moreover, a summary of statistics and numerical results from the PERLOG test (using ) can be found in Table 6. The results suggest that the main contributor to overall group differences (33.43%) is the comparison between pattern #3 (social non-readers) and pattern #2 (non-social readers), the most distinct group. Furthermore, logratios involving schoolwork time contribute most significantly to distinguishing the group defined by pattern #2.
Additionally, Figure 6 shows bootstrap estimates of the differences in mean time spent on each sedentary behaviour between groups (mean and confidence interval; 1000 resamples run). These were computed on the logratio scale; that is, based on ,5 with for pattern numbers and . Thus, values around zero indicate similar allocation by group. The largest differences relate to schoolwork in general, with the relative time allocated to it standing out for those individuals in pattern #2. In contrast, screen time is the most comparable behaviour across groups, as well as the one with the lowest variability in the differences according to the width of the 95% confidence intervals.
Bootstrap estimates of differences in mean time spent on each sedentary behaviour (a value zero means no differences between zero patterns). Notation: Rea (reading), Soc (social), Sch (schoolwork), Scr (screen time) and Oth (others).
Final remarks
We have introduced a novel non-parametric test, referred to as the PERLOG test, for statistically assessing the relevance of mean differences between groups in compositional data sets, where groupings may be defined either externally or internally by a given factor. The method does not rely on stringent data assumptions or any particular logratio coordinate representation. Instead, it builds on the most elemental units containing information about the relative variation structure in compositional data, that is the pairwise logratio means and variances. Moreover, by construction, it enables the investigation of the sources of variation between groups at different levels.
The use of the method has been illustrated in two exemplary case studies on movement behaviours and time-use epidemiology, a scientific domain where compositional methods have gained considerable traction in recent years. A simulation study was set up to formally explore and demonstrate its performance in a range of scenarios, using this application area as reference framework. While any simulation study is necessarily limited and constrained, it provides valuable insight into the performance of the method under varying conditions. These include changing effect sizes, number of parts of the composition, and number, size and underlying variation structure of the groups.
As a statistical hypothesis test, such simulation-based evaluation, based on type I error and statistical power rates as ordinarily, demonstrates that it verifies the basic properties expected for any such test. For the case of external grouping factors, the method provides results comparable to conventional approaches such as MANOVA and PERMANOVA when their underlying assumptions are met. Moreover, it generally outperforms them under less ideal conditions. Additionally, a key distinguishing feature of the PERLOG test is that it is applicable to situations where the groupings are defined by an internal factor. Therefore, we contend that the proposed method provides practitioners with a robust and practical all-in-one solution.
In real-world applications, data-generating mechanisms are often more complex and partially or entirely unknown. It is in these cases in which we anticipate the PERLOG test to offer practitioners a distinct advantage over standard methods to compare groups. This is evident in its application to identifying subpopulations in data sets based on zero patterns, an issue of practical relevance in compositional data analysis. However, particularly for internally defined grouping factors, the user should exercise caution to ensure that minimal conditions to conduct meaningful statistical comparisons apply. The number of patterns and the scarcity of data within them will naturally tend to increase with the size of the data set. Consequently, some preliminary processing may be convenient to make the inference task feasible. Moreover, although our investigation suggests robustness over increasing values of the tuning parameter , particularly in the case of an internal grouping factor, it would be recommendable in practice to conduct a sensitivity analysis to confirm the stability of results and conclusions.
Related to the above, note that the PERLOG test has been developed and demonstrated here for compositional data sets typically comprising up to a moderately large number of parts. Although there is no inherent technical limitation preventing its application, it is expected that moving into high dimensions would require some adaptations to address new challenges specific to such scenario. Thus, handling pairwise logratios within the embedded resampling scheme increases rapidly with the size of the composition. For example, on a workstation equipped with an Intel Core i9-13900K processor (24 cores) and 64GB of RAM as used in this work, the method requires approximately 1.5 minutes for a 100-part composition (4950 pairwise logratios) and 41 minutes for a 500-part composition (124,750 pairwise logratios), based on 50 observations split evenly between two groups. Therefore, its use in fields such as the omics sciences, where compositions often consist of hundreds or even thousands of parts, can be computationally challenging. Furthermore, dealing with a structure of internally defined groupings in this setting, where also data sparsity is a common issue, can hamper meaningful inference. Moreover, post hoc testing at the pairwise logratio level would require a strategy to aid interpretation, such as ranking parts according to involvement in significant differences across pairwise logratios. Hence, developing an optimised formulation of the method for high-dimensional compositions is an open direction for future research.
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Spanish Ministry of Science and Innovation [MCIN/AEI/10.13039/501100011033] (PID2021-123833OB-I00) and Department of Research and Universities of the Generalitat de Catalunya (2021SGR01197).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
No new data sets were generated in this study. Both data sets used in the illustrative examples come from previous publications. We refer the reader to the original references for any related request.
References
1.
AitchisonJ. The statistical analysis of compositional data. Monographs on statistics and applied probability. London: Chapman & Hall, 1986.
2.
Barceló-VidalCMartín-FernándezJA. The mathematics of compositional analysis. Aust J Stat2016; 45: 57–71.
3.
SmithHGnanadesikanRHughesJ. Multivariate analysis of variance (MANOVA). Biometrics1962; 18: 22–41.
Martín-FernándezJADaunis-i EstadellaJMateu-FiguerasG. On the interpretation of differences between groups for compositional data. Stat Oper Res Trans2015; 39: 231–252.
6.
BrücknerAHeethoffM. A chemo-ecologists’ practical guide to compositional data analysis. Chemoecology2017; 27: 33–46.
7.
ButlerBPalarea-AlbaladejoJShepherdKDet al.Mineral-nutrient relationships in African soils assessed using cluster analysis of X-ray powder diffraction patterns and compositional methods. Geoderma2020; 375: 114474.
8.
WeiseDRJungHPalarea-AlbaladejoJet al.Compositional data analysis of smoke emissions from debris piles with low-density polyethylene. J Air Waste Manage Assoc2020; 70: 834–845.
9.
OdintsovaVEKlimenkoNSTyakhtAV. Approximation of a microbiome composition shift by a change in a single balance between two groups of taxa. Msystems2022; 7: e00155.
10.
Palarea-AlbaladejoJMartín-FernándezJASotoJA. Dealing with distances and transformations for fuzzy C-means clustering of compositional data. J Classif2012; 29: 144–169.
11.
NesrstováVJaškováPPavlůIet al.Simple enough, but not simpler: reconsidering additive logratio coordinates in compositional analysis. SORT–Stat Oper Res Trans2023; 47: 269–294.
12.
Palarea-AlbaladejoJMartín-FernándezJA. zCompositions – R package for multivariate imputation of left-censored data under a compositional approach. Chemometr Intell Lab Syst2015; 143: 85–96.
13.
Martín-FernándezJAHronKTemplMet al.Bayesian-multiplicative treatment of count zeros in compositional data sets. Stat Modell2015; 15: 134–158.
14.
Palarea-AlbaladejoJMartín-FernándezJARuiz-GazenAet al.lrSVD: an efficient imputation algorithm for incomplete high-throughput compositional data. J Chemom2022; 36: e3459.
15.
Martín-FernándezJAPalareaJOleaRA. Dealing with zeros. In: Pawlowsky-Glahn V and Buccianti A (eds), Compositional data analysis: theory and applications, chapter 4. Chichester: John Wiley & Sons, 2011. pp. 43–58.
16.
AitchisonJKayJW. Possible solution of some essential zero problems in compositional data analysis. In: Thió-Henestrosa S and Martín-Fernández JA (eds), Proceedings of CoDaWork’03, The 1st Compositional Data Analysis Workshop. Girona, Spain: Universitat de Girona. https://dugi-doc.udg.edu/handle/10256/652.
17.
Bacon-ShoneJ. Modelling structural zeros in compositional data. In: Thió-Henestrosa S, Martín-Fernández JA (eds) Proceedings of CoDaWork’03, The 1st Compositional Data Analysis Workshop. Girona, Spain: Universitat de Girona. https://dugi-doc.udg.edu/handle/10256/661.
18.
Pawlowsky-GlahnVEgozcueJJ. About covariance and correlation in the simplex. In: Proceedings of the 4th International Conference on Mathematical Geology. Cancun, p. 10.
19.
EgozcueJPawlowsky-GlahnVMateu-FiguerasGet al.Isometric logratio transformations for compositional data analysis. Math Geol2003; 35: 279–300.
20.
Martín-FernándezJA. Comments on: Compositional data: the sample space and its structure. TEST2019; 28: 653–657.
21.
Pawlowsky-GlahnVEgozcueJ. Exploring compositional data with the CoDa-dendrogram. Aust J Stat2011; 40: 103–113.
22.
FilzmoserPHronKTemplM. Applied compositional data analysis. Springer series in statistics. Cham: Springer, 2018.
23.
GoodP. Permutation tests: a practical guide to resampling methods for testing hypotheses. Springer series in statistics. New York: Springer, 2000.
24.
WelchBL. The generalization of ‘student’s’ problem when several different population variances are involved. Biometrika1947; 34: 28–35.
25.
R Core Team. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, 2024. https://www.r-project.org.
26.
CuiXDickhausTDingYet al. (eds) Handbook of multiple comparisons. Handbooks of modern statistical methods. Boca Raton: Chapman & Hall/CRC, 2021.
27.
MeadR. The design of experiments: statistical principles for practical applications. Cambridge: Cambridge University Press, 1988.
28.
WilsonEB. Probable inference, the law of succession, and statistical inference. J Am Stat Assoc1927; 22: 209–212.
29.
FriedrichSPaulyM. MATS: inference for potentially singular and heteroscedastic MANOVA. J Multivar Anal2018; 165: 166–179. DOI: 10.1016/j.jmva.2017.12.008.
30.
KonietschkeFBathkeACHarrarSWet al.Parametric and nonparametric bootstrap methods for general MANOVA. J Multivar Anal2015; 140: 291–301. DOI: 10.1016/j.jmva.2015.05.001.
31.
AndersonMJWalshDCI. PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: what null hypothesis are you testing?. Ecol Monogr2013; 83: 557–574.
32.
AndersonMJWalshDCIRobert ClarkeKet al.Some solutions to the multivariate Behrens–Fisher problem for dissimilarity-based analyses. Aust N Z J Stat2017; 59: 57–79. 10.1111/anzs.12176.
33.
ChastinSPalarea-AlbaladejoJDontjeMet al.Combined effects of time spent in physical activity, sedentary behaviors and sleep on obesity and cardio-metabolic health markers: a novel compositional data analysis approach. PLoS ONE2015; 10: 1–37.
34.
ŽPedišićDDumuidTOlds. Integrating sleep, sedentary behaviour, and physical activity research in the emerging field of time-use epidemiology: definitions, concepts, statistical methods, theoretical framework, and future directions. Kinesiology2017; 49: 252–269.
35.
DumuidDStanfordTOldsTet al.Compositional data analysis for physical activity, sedentary time and sleep research. Stat Methods Med Res2018; 27: 3726–3738.
36.
ŠtefelováNDygrýnJHronKet al.Robust compositional analysis of physical activity and sedentary behaviour data. Int J Environ Res Public Health2018; 15: 2248.
37.
DumuidDPedišićŽStanfordTet al.The compositional isotemporal substitution model: a method for estimating changes in a health outcome for reallocation of time between sleep, physical activity and sedentary behaviour. Stat Methods Med Res2019; 28: 846–857.
38.
PelclováJŠtefelováNDumuidDet al.Are longitudinal reallocations of time between movement behaviours associated with adiposity among elderly women? A compositional isotemporal substitution analysis. Int J Obes2020; 44: 857–864.
39.
GábaADygrýnJŠtefelováNet al.How do short sleepers use extra waking hours? A compositional analysis of 24-hour time-use patterns among children and adolescents. Int J Behav Nutr Phys Activity2020; 17: 104.
40.
GábaAPediši掊tefelováNet al.Sedentary behavior patterns and adiposity in children: a study based on compositional data analysis. BMC Pediatr2020; 20: 147.
41.
McGregorDEPalarea-AlbaladejoJDallPMet al.Cox regression survival analysis with compositional covariates: application to modelling mortality risk from 24-h physical activity patterns. Stat Methods Med Res2020; 29: 1447–1465.
42.
ŠtefelováNPalarea-AlbaladejoJHronKet al.Compositional PLS biplot based on pivoting balances: an application to explore the association between 24-h movement behaviours and adiposity. Comput Stat2024; 39: 835–863.
43.
HildebrandMHansenBvan HeesVet al.Evaluation of raw acceleration sedentary thresholds in children and adults. Scand J Med Sci Sports2017; 27: 1814–1823.
44.
van HeesVTSabiaSAndersonKNet al.A novel, open access method to assess sleep duration using a wrist-worn accelerometer. PLoS ONE2015; 10: e0142533.
45.
AitchisonJGreenacreM. Biplots of compositional data. J R Stat Soc C: Appl Stat2002; 51: 375–392.
46.
BenjaminiYHochbergY. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B: Methodol1995; 57: 289–300.
47.
CompernolleSVan DyckDDe CockerKet al.Differences in context-specific sedentary behaviors according to weight status in adolescents, adults and seniors: a compositional data analysis. Int J Environ Res2018; 15: 1916.