Sage Journals: Discover world-class research

Abstract

In this article, we propose to cluster responses in order to identify groups predicted by specific explanatory components. A response matrix is assumed to depend on a set of explanatory variables and a set of additional covariates. Explanatory variables are supposed many and redundant, which implies some dimension reduction and regularization. By contrast, additional covariates contain few selected variables which are forced into the regression model, as they demand no regularization. The response matrix is assumed partitioned into several unknown groups of responses. We suppose that the responses in each group are predictable from an appropriate number of specific orthogonal supervised components of explanatory variables. The classification is based on a mixture model of the responses. To estimate the model, we propose a criterion extending that of Supervised Component-based Generalized Linear Regression, a Partial Least Squares-type method, and develop an algorithm combining component-based model and Expectation Maximization estimation. This new methodology is tested on simulated data and then applied to a floristic ecology dataset.

Keywords

EM algorithm Response mixture SCGLR Supervised components Taxa classification

Get full access to this article

View all access options for this article.

References

Bastien

, Vinzi

and Tenenhaus

(2005) PLS generalised linear regression. Computational Statistics & Data Analysis , 48, 17–46.

Beale

, Lennon

and Gimona

(2008) Opening the climate envelope reveals no macroscale associations with climate in European birds. Proceedings of the National Academy of Sciences , 105, 14908–14912.

Bry

and Verron

(2015) THEME: THEmatic Model Exploration through multiple costructure maximization. Journal of Chemometrics , 29, 637–647.

Bry

, Redont

, Verron

and Cazes

(2012) THEME-SEER: a multidimensional exploratory technique to analyze a structural model using an extended covariance criterion. Journal of Chemometrics , 26, 158–169.

Bry

, Trottier

, Verron

and Mortier

(2013) Supervised component generalized linear regression using a PLS-extension of the Fisher scoring algorithm. Journal of Multivariate Analysis , 119, 47–60.

Bry

, Simac

, El Ghachi

and Antoine

(2020a) Bridging data exploration and modelling in event-history analysis: the supervised-component Cox regression. Mathematical Population Studies , 27, 139–174.

Bry

, Trottier

, Mortier

and Cornu

(2020b) Component-based regularization of a multivariate GLM with a thematic partitioning of the explanatory variables. Statistical Modelling , 20, 96–119.

Chauvet

, Trottier

and Bry

(2019) Component-Based Regularization of Multivariate Generalized Linear Mixed Models. Journal of Computational and Graphical Statistics , 28, 909–920.

Chavent

, Simonet

, Liquet

and Saracco

(2012) ClustOfVar: An R Package for the Clustering of Variables. Journal of Statistical Software , 50, 1–16.

10.

Ca´ceres M

, Legendre

and Moretti

(2010) Improving indicator species analysis by combining groups of sites. Oikos , 119, 1674– 1684.

11.

Dempster

, Laird

and Rubin

(1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B (Methodological) , 39, 1–22.

12.

Dufreˆne

and Legendre

(1997) Species assemblages and indicator species: the need for a flexible asymmetrical approach. Ecological Monographs , 67, 345–366.

13.

Dunstan

, Foster

and Darnell

(2011) Model based grouping of species across environmental gradients. Ecological Modelling , 222, 955–963.

14.

Dunstan

, Foster

, Hui

and Warton

(2013) Finite mixture of regression modelling for high-dimensional count and biomass data in ecology. Journal of Agricultural, Biological, and Environmental Statistics , 18, 357–375.

15.

Durif

, Modolo

, Michaelsson

, Mold

, Lambert-Lacroix

and Picard

(2018) High dimensional classification with combined adaptive sparse PLS and logistic regression. Bioinformatics , 34, 485–493.

16.

Guisan

and Thuiller

(2005) Predicting species distribution: offering more than simple habitat models. Ecology Letters , 8, 993–1009.

17.

Hill

, Woolley

SNC

, Foster

, Dunstan

, McKinlay

, Ovaskainen

and Johnson

(2020) Determining marine bioregions: A comparison of quantitative approaches. Methods in Ecology and Evolution , 11, 1258–1272.

18.

Hubert

and Arabie

(1985) Comparing partitions. Journal of Classification , 2, 193–218.

19.

Hutter

, Lu¨

cke J

and Schmidt-Thieme

(2015) Beyond manual tuning of hyperparameters. KI-Ku¨nstliche Intelligenz , 29, 329–337.

20.

Keribin

(2000) Consistent estimation of the order of mixture models. Sankhya¯: The Indian Journal of Statistics, Series A (1961-2002) , 62, 49–66.

21.

Marx

(1996) Iteratively reweighted partial least squares estimation for generalized linear regression. Technometrics , 38, 374–381.

22.

McCullagh

and Nelder

(1989) Generalized Linear Models . Chapman and Hall.

23.

McLachlan

and Peel

(2004). Finite mixture models . John Wiley & Sons.

24.

Mevik

B-H

and Wehrens

(2007) The pls Package: Principal Component and Partial Least Squares Regression in R. Journal of Statistical Software , 18, 1–23.

25.

Monni

and Tadesse

(2009) A stochastic partitioning method to associate highdimensional responses and covariates. Bayesian Analysis , 4, 413–436.

26.

Mortier

, Oue´draogo

D-Y

, Claeys

, Tadesse

, Cornu

, Baya

, Benedet

, Freycon

, Gourlet-Fleury

and Picard

(2015) Mixture of inhomogeneous matrix models for species-rich ecosystems. Environmetrics , 26, 39–51.

27.

Nelder

and Wedderburn

(1972) Generalized linear models. Journal of the Royal Statistical Society: Series A (General) , 135, 370–384.

28.

Ovaskainen

and Soininen

(2011) Making more out of sparse data: hierarchical modelling of species communities. Ecology , 92, 289–295.

29.

Pledger

and Arnold

(2014) Multivariate methods using mixtures: Correspondence analysis, scaling and pattern-detection. Computational Statistics & Data Analysis , 71, 241–261.

30.

Pollock

, Tingley

, Morris

, Golding

, O’Hara

, Parris

, Vesk

and McCarthy

(2014) Understanding cooccurrence by modelling species simultaneously with a Joint Species Distribution Model (JSDM). Methods in Ecology and Evolution , 5, 397–406.

31.

Rand

(1971) Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association , 66, 846–850.

32.

Re´jou-Me´chain

, Mortier

, Bastin

J-F

, Cornu

, Barbier

, Bayol

, Be´ne´det

, Bry

, Dauby

, & Deblauwe

, . (2021) Unveiling African rainforest composition and vulnerability to global change. Nature , 593, 90–94.

33.

Schwarz

(1978) Estimating the dimension of a model. The Annals of Statistics , 6, 461–464.

34.

Warton

, Foster

, De’ath

, Stoklosa

and Dunstan

(2015) Model-based thinking for community ecology. Plant Ecology , 216, 669–682.

35.

Wold

, Ruhe

, Wold

and Dunn

III (1984) The collinearity problem in linear regression. The partial least squares (PLS) approach to generalized inverses. SIAM Journal on Scientific and Statistical Computing , 5, 735–743.

36.

Yee

and Hastie

(2003) Reduced-rank vector generalized linear models. Statistical modelling , 3, 15–41.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.35 MB

Response mixture models based on supervised components: Clustering floristic taxa

Abstract

Keywords

Get full access to this article

References

Supplementary Material