Abstract
Multilocus variable number tandem repeat analysis (MLVA) is a molecular subtyping technique that remains useful for those without the resources to access whole genome sequencing for the tracking and tracing of bacterial contaminants. Unlike techniques such as multilocus sequence typing (MLST) and pulsed-field gel electrophoresis, MLVA did not emerge as a standardized subtyping method for Listeria monocytogenes, and as a result, there is no reference database of virulent or food-associated MLVA subtypes as there is for MLST-based clonal complexes (CCs). Having previously shown the close congruence of a 5-loci MLVA scheme with MLST, a predictive model was created using the XGBoost machine learning (ML) technique, which enabled the prediction of CCs from MLVA patterns with ∼85% (±4%) accuracy. As well as validating the model on existing data, a straightforward update protocol was simulated for if and when previously unseen subtypes might arise. This article illustrates how ML techniques can be applied with elementary coding skills to add value to previous-generation molecular subtyping data in-built food processing environments.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
