Beyond One-Size-Fits-All: A Differential Sensitivity Framework for Machine Learning–Based Detection of Anomalous Survey Responses

Abstract

Anomalous survey responses, including random, careless, extreme, acquiescent, straightline, and alternating responding, threaten the validity of survey-based research. Machine learning (ML) algorithms offer flexible, model-agnostic alternatives to traditional detection methods, yet their relative effectiveness across anomaly types remains poorly understood. This study evaluated 11 unsupervised anomaly detection algorithms spanning four paradigms (distance-based, density-based, reconstruction-based, and tree/boundary-based) against six simulated anomaly types embedded in a realistic survey dataset (N = 3,000). Results revealed pronounced differential sensitivity: globally deviant patterns (random, extreme, alternating) were universally detectable, whereas careless and acquiescent responding required reconstruction- or boundary-based methods, and straightline responding resisted detection by all algorithms (maximum area under the receiver operating characteristic curve [AUC-ROC] < .70). No single algorithm dominated across all types. These findings argue for multimethod approaches combining ML algorithms with traditional response quality indicators, and provide a framework for selecting detection methods based on anticipated anomaly types.

Keywords

anomaly detection machine learning survey data quality unsupervised learning benchmarking

Get full access to this article

View all access options for this article.

References

Alfons

Dahl

Lago

Schuster

(2024). Open science perspectives on machine learning for the identification of careless responding: A new hope or phantom menace? Social and Personality Psychology Compass, 18(4), Article e12941. https://doi.org/10.1111/spc3.12941

Arias

V. B.

Garrido

L. E.

Jenaro

Martínez-Molina

Arias

(2020). A little garbage in, lots of garbage out: Assessing the impact of careless responding in personality survey data. Behavior Research Methods, 52(6), 2489–2505. https://doi.org/10.3758/s13428-020-01401-8

Bank

Koenigstein

Giryes

(2023). Autoencoders. In Rokach

Maimon

Shmueli

(Eds.), Machine learning for data science handbook: Data mining and knowledge discovery handbook (pp. 353–374). Springer.

Bowling

N. A.

Huang

J. L.

(2018). Your attention please! Toward a better understanding of research participant carelessness. Applied Psychology, 67(2), 231–251. https://doi.org/10.1111/apps.12143

Braeken

van Laar

(2025). Reducing calibration bias for person fit assessment by mixture model expansion. Educational and Psychological Measurement, 86(1), 111–134.

Breunig

M. M.

Kriegel

H. P.

R. T.

Sander

(2000). LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on management of data. https://dl.acm.org/doi/10.1145/335191.335388

Campos

G. O.

Zimek

Sander

Campello

R. J. G. B.

Micenková

Schubert

Assent

Houle

M. E.

(2016). On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study. Data Mining and Knowledge Discovery, 30(4), 891–927. https://doi.org/10.1007/s10618-015-0444-8

Ding

(2026). Rise of the machine: Detecting aberrant response patterns in survey instruments using autoencoder. Applied Psychological Measurement. Advance online publication. https://doi.org/10.1177/01466216261425242

Drasgow

Levine

M. V.

Williams

E. A.

(1985). Appropriateness measurement with polychotomous item response models and standardized indices. British Journal of Mathematical and Statistical Psychology, 38(1), 67–86.

10.

Emmott

A. F.

Das

Dietterich

Fern

Wong

W. K.

(2015). A meta-analysis of the anomaly detection problem. https://doi.org/10.48550/arXiv.1503.01158

11.

Erfani

S. M.

Rajasegarar

Karunasekera

Leckie

(2016). High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition, 58, 121–134. https://doi.org/10.1016/j.patcog.2016.03.028

12.

Goldstein

Uchida

(2016). A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLOS ONE, 11(4), Article e0152173https://doi.org/10.1371/journal.pone.0152173

13.

Gorgun

Bulut

(2022). Identifying aberrant responses in intelligent tutoring systems: An application of anomaly detection methods. Psychological Test and Assessment Modeling, 64(4), 359–384.

14.

Harris

C. R.

Millman

K. J.

van der Walt

S. J.

Gommers

Virtanen

Cournapeau

Wieser

Taylor

Berg

Smith

N. J.

Kern

Picus

Hoyer

van Krevelen

M. H.

Brett

Haldane

del Río

J. F.

Wiebe

Peterson POliphant

T. E.

(2020). Array programming with NumPy. Nature, 585(7825), 357–362. https://doi.org/10.1038/s41586-020-2649-2

15.

Huang

F. L.

Cornell

(2015). Using multilevel factor analysis with clustered data: Investigating the factor structure of the positive values scale. Journal of Psychoeducational Assessment, 34, 3–14. https://doi.org/10.1177/0734282915570278

16.

Huang

J. L.

Liu

Bowling

N. A.

(2015). Insufficient effort responding: Examining an insidious confound in survey data. Journal of Applied Psychology, 100(3), 828–845. https://doi.org/10.1037/a0038510

17.

Jin

Tung

A. K. H.

Han

Wang

(2006). Ranking outliers using symmetric neighborhood relationship. In Proceedings of the 10th Pacific-Asia conference on knowledge discovery and data mining (PAKDD) (pp. 577–593). Springer.

18.

Kandanaarachchi

(2022). Unsupervised anomaly detection ensembles using item response theory. Information Sciences, 587, 142–163.

19.

Kandanaarachchi

Muñoz

M. A.

Hyndman

R. J.

Smith-Miles

(2020). On normalization and algorithm selection for unsupervised outlier detection. Data Mining and Knowledge Discovery, 34(2), 309–354. https://doi.org/10.1007/s10618-019-00661-z

20.

Kriegel

H. P.

Kröger

Schubert

Zimek

(2011). Interpreting and unifying outlier scores. In Proceedings of the 2011 SIAM international conference on data mining. https://epubs.siam.org/doi/10.1137/1.9781611972818.2

21.

Latecki

L. J.

Lazarevic

Pokrajac

(2007). Outlier detection with kernel density functions. In Proceedings of the 5th international workshop on machine learning and data mining in pattern recognition (pp. 61–75). Springer.

22.

Liu

F. T.

Ting

K. M.

Zhou

Z. H.

(2008, December 15–19). Isolation forest [Conference session]. 8th IEEE International Conference on Data Mining, Pisa, Italy.

23.

Liu

Ponce

Brunton

S. L.

Kutz

J. N.

(2023). Multiresolution convolutional autoencoders. Journal of Computational Physics, 474, Article 111801.

24.

Meade

A. W.

Craig

S. B.

(2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455.

25.

Paszke

Gross

Massa

Lerer

Bradbury

Chanan

Killeen

Lin

Gimelshein

Antiga

Desmaison

Köpf

Yang

DeVito

Raison

Tejani

Chilamkurthy

Steiner

Fang

Chintala

(2019). PyTorch: An imperative style, high-performance deep learning library. Advances in Neural Information Processing Systems, 32, 8024–8035.

26.

Paulhus

D. L.

(1991). Measurement and control of response bias. In Robinson

J. P.

Shaver

P. R.

Wrightsman

L. S.

(Eds.), Measures of personality and social psychological attitudes (Vol. 1, pp. 17–59). Academic Press.

27.

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

Blondel

Prettenhofer

Weiss

Dubourg

Vanderplas

Passos

Cournapeau

Brucher

Perrot

Duchesnay

É.

(2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

28.

Ramaswamy

Rastogi

Shim

(2000). Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD international conference on management of data. https://dl.acm.org/doi/10.1145/335191.335437

29.

Rupp

A. A.

(2013). A systematic review of the methodology for person fit research in item response theory: Lessons about generalizability of inferences from the design of simulation studies. Psychological Test and Assessment Modeling, 55(1), Article 3.

30.

Schölkopf

Platt

J. C.

Shawe-Taylor

Smola

A. J.

Williamson

R. C.

(2001). Estimating the support of a high-dimensional distribution. Neural Computation, 13(7), 1443–1471. https://doi.org/10.1162/089976601750264965

31.

Schroeders

Schmidt

Gnambs

(2021). Detecting careless responding in survey data using stochastic gradient boosting. Educational and Psychological Measurement, 82(1), 29–29. https://doi.org/10.1177/00131644211004708

32.

Schubert

Zimek

Kriegel

H. P.

(2014). Generalized outlier detection with flexible kernel density estimates. In Proceedings of the 2014 SIAM international conference on data mining. https://epubs.siam.org/doi/10.1137/1.9781611973440.63

33.

Talagala

P. D.

Hyndman

R. J.

Smith-Miles

(2021). Anomaly detection in high-dimensional data. Journal of Computational and Graphical Statistics, 30(2), 360–374. https://doi.org/10.1080/10618600.2020.1807997

34.

Tang

(2017). A local density-based approach for outlier detection. Neurocomputing, 241, 171–180. https://doi.org/10.1016/j.neucom.2017.02.039

35.

Tang

Chen

A. W. C.

Cheung

D. W.

(2002). Enhancing effectiveness of outlier detections for low density patterns. In Proceedings of the 6th Pacific-Asia conference on knowledge discovery and data mining (PAKDD) (pp. 535–548). Springer.

36.

Thomas

Feuillard

Gramfort

(2015). Calibration of one-class SVM for MV set estimation. In Proceedings of the IEEE international conference on data science and advanced analytics (DSAA). https://ieeexplore.ieee.org/document/7344789

37.

Tourangeau

Rips

L. J.

Rasinski

(Eds.). (2000). The psychology of survey response. Cambridge University Press. https://doi.org/10.1017/CBO9780511819322

38.

Ulitzsch

Buchholz

Shin

H. J.

Bertling

Lüdtke

(2024). Using a novel multiple-source indicator to investigate the effect of scale format on careless and insufficient effort responding in a large-scale survey experiment. Large-scale Assessments in Education, 12, Article 182024. https://doi.org/10.1186/s40536-024-00205-y

39.

Ulitzsch

Shin

H. J.

Lüdtke

(2023). Accounting for careless and insufficient effort responding in large-scale survey data—development, evaluation, and application of a screen-time-based weighting procedure. Behavior Research Methods, 56(2), Article 804. https://doi.org/10.3758/s13428-022-02053-6

40.

Vincent

Larochelle

Lajoie

Bengio

Manzagol

P. A.

(2010). Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research, 11, 3371–3408.

41.

Virtanen

Gommers

Oliphant

T. E.

Haberland

Reddy

Cournapeau

Burovski

Peterson

Weckesser

Bright

van der Walt

S. J.

Brett

Wilson

Millman

K. J.

Mayorov

Nelson

A. R. J.

Jones

Kern

Larson

others

(2020). 1.0 Contributors. Scipy, 1, 261–272. https://doi.org/10.1038/s41592-019-0686-2

42.

Woods

C. M.

(2006). Careless responding to reverse-worded items: Implications for confirmatory factor analysis. Journal of Psychopathology and Behavioral Assessment, 28(3), 186–191. https://doi.org/10.1007/s10862-005-9004-7

43.

Zhang

Conrad

F. G.

(2014). Speeding in web surveys: The tendency to answer very fast and its association with straightlining. Survey Research Methods, 8(2), 127–135.

44.

Zhang

Hutter

Jin

(2009). A new local distance-based outlier detection approach for scattered real-world data. In Proceedings of the 13th Pacific-Asia conference on knowledge discovery and data mining (PAKDD) (pp. 813–822). Springer.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.08 MB

0.05 MB

0.00 MB

0.02 MB

0.04 MB

0.05 MB