Abstract
This paper tackles a new problem in outlier detection: how to promptly detect the local outlier of a large-scale mixed attribute data in the big data era. This poses significant challenges due to a lack of access to the entire mixed attribute dataset at any individual compute machine. Proposed approaches firstly form a mechanism that deletes the massive clear non-noise and extracts cluster-based pre-noise set. Furthermore, we analyze pre-noise set using multi-step distributed LOF computing method on the Spark platform. Finally, the ordered LOF list is the output result. Comprehensive experiments are implemented by large-scale Benchmark datasets and the Spark platform. Extensive results show that the performance of our approaches are superior to the previous ones (4X faster than baseline LOF/2X faster than DLOF) when compared to state-of-the-art techniques, and therefore is believed to be able to give better guidance to local outlier detection of mixed attribute data.
Keywords
Get full access to this article
View all access options for this article.
