Abstract
Partial least squares discriminant analysis (PLS-DA) is often used for data sets that consist of a large number of potential predictors but relatively few observations such that chance correlations between predictors and response can occur that lead to false conclusions. Hence, there is a need for data adequacy testing before model building but currently no such method exists. In this work we propose one where we used random permutations to destroy the correlation structure between predictor and response data. This produced normal distributions of chance correlation coefficients that were used to find correlation coefficients in the non-permuted data that differed significantly from chance occurrences. Based on these distributions, we defined two novel null hypotheses to control for when a true null hypothesis is incorrectly rejected and the other for when a false null hypothesis is not rejected. To counter false positive errors, the standard significance levels were adjusted with predictor-based Bonferroni corrections. To counter false negative errors, we compared the true and permuted correlation coefficients in distribution tails. The outcomes of the hypothesis tests then indicated whether or not PLS-DA models could be successfully built from these data sets. We also investigated how to determine the number of samples needed for a data set with a given number of predictors. Simulations showed that our method produced significantly fewer false positives than PLS-DA (P = 0.0018, our method error rate 12 × less than PLS-DA error rate) but significantly more false negatives (P = 0.0003, our method error rate 4.5 × more than PLS-DA error rate). Data from Raman spectroscopy showed that the method transferred to real data. By pre-screening such data, our method can aid in assessing whether to proceed with model building and, when there is a need to increase the sample size, we show by how much.
This is a visual representation of the abstract.
Keywords
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
