Abstract
In this paper, we address the robustness issue of maximum likelihood based methods in data clustering. Probabilistic mixture model has been a well known approach to cluster analysis. However, as they rely on maximum likelihood estimation (MLE), the algorithms are often very sensitive to noise and outliers. In this work, we implement a variant of the classical mixture model-based clustering (M2C) following a proposed general framework for handling outliers. Genetic Algorithm (GA) is incorporated into the framework to produce a novel algorithm called GA-based Partial M2C (GA-PM2C). Analytical and experimental studies show that GA-PM2C can overcome the negative impact of outliers in data clustering, hence provides highly accurate and reliable clustering results. It also exhibits excellent consistency in performance and low sensitivity to initializations.
Keywords
Get full access to this article
View all access options for this article.
