Abstract
Anomalous survey responses, including random, careless, extreme, acquiescent, straightline, and alternating responding, threaten the validity of survey-based research. Machine learning (ML) algorithms offer flexible, model-agnostic alternatives to traditional detection methods, yet their relative effectiveness across anomaly types remains poorly understood. This study evaluated 11 unsupervised anomaly detection algorithms spanning four paradigms (distance-based, density-based, reconstruction-based, and tree/boundary-based) against six simulated anomaly types embedded in a realistic survey dataset (N = 3,000). Results revealed pronounced differential sensitivity: globally deviant patterns (random, extreme, alternating) were universally detectable, whereas careless and acquiescent responding required reconstruction- or boundary-based methods, and straightline responding resisted detection by all algorithms (maximum area under the receiver operating characteristic curve [AUC-ROC] < .70). No single algorithm dominated across all types. These findings argue for multimethod approaches combining ML algorithms with traditional response quality indicators, and provide a framework for selecting detection methods based on anticipated anomaly types.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
