Abstract
The oral microbiome is a complex environment that consists of diverse microorganisms inhabiting the oral cavity. There are more than 700 different species of bacteria living in the oral cavity which provides nutrition to the microorganisms living in the mouth. As samples tend to be collected with a variation in non-biological factors, batch effects will occur. Batch effects are variations in the same samples, where the variations are affected by the differences in equipment used, the time when the samples were collected, the laboratory conditions, etc. Batch effects can be difficult to address as the variation might not be apparent in individual samples but rather as a whole group between samples. Several research has been proposed to resolve the batch effect, but they tend to require a two-step approach (batch effect removal, and classification), or will suffer from dropout events in gene expressions. In this study, we propose a one-step approach that combines both the batch effect removal and disease classification, eliminating the need for a two-step approach process. LassoNet was used with batch loss to mitigate the effect of batch effect and to classify disease outcome on oral microbiome simultaneously. The model achieved better performance than our baseline models, reaching 0.8 area under the curve on average on the five studies of oral microbiome. In addition, another key aspect of using LassoNet is its ability to carry out feature importance analysis, which is capable to reveal key oral microbiomes associated with disease outcomes.
Get full access to this article
View all access options for this article.
