An efficient density-based clustering with side information and active learning: A case study for facial expression recognition task

Abstract

Data clustering is one of the most important tasks in machine learning and data mining, which aims to discover natural structure of the data, identify relationships between observations inside data sets, or detect outliers. Clustering is traditionally seen as part of unsupervised learning, but in many situations, side information about the clusters may be available in addition to the values of the features. For example, the cluster labels of some observations may be known (called seeds) or certain observations may be known to belong (or not) to the same cluster (pairwise constraints). Clustering algorithms using such information are called semi-supervised algorithms. A problem is that although many semi-supervised clustering algorithms have been presented in literature over the last decades, each of them usually uses one kind of side information. In this work, we aim to propose a new semi-supervised density based clustering which integrates effectively both kinds of side information, and embeds an active learning strategy in the process of finding clusters, named MCSSDBS. In order to evaluate our proposed method and demonstrate its effectiveness compared with a state-of-the-art semi-supervised density-based clustering (SSDBSCAN), a series of experiments is carried out on both synthetic and real world data sets. First is experiments primarily conducted on 6 data sets from UCI repository. Then, especially for the facial expression recognition task, our tests are performed on two facial data sets: A popular one in literature – the extended Cohn Kanade Data set (CK+), and our own new facial data set collected from volunteers in Vietnam – named ITI facial expression data set. Comparative results conducted show that our method can boost the performance of clustering process.

Keywords

Semi-supervised clustering density-based clustering active learning side information facial expression recognition

Get full access to this article

View all access options for this article.

References

Basu

Davidson

and Wagstaff

K.L.

, Constrained Clustering: Advances in Algorithms, Theory, and Applications, Chapman and Hall/CRC Data Mining and Knowledge Discovery Series, 1st ed., 2008.

Wagstaff

K.L.

Cardie

Rogers

and Schroedl

, Constrained K-means Clustering with Background Knowledge, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML), 2001, pp, 577–584.

Basu

Banerjee

and Mooney

, Semi-supervised Clustering by Seeding, Proceedings of the Nineteenth International Conference on Machine Learning (ICML), 2002, pp. 27–34.

Bensaid

A.M.

Hall

L.O.

Bezdek

J.C.

and Clarke

L.P.

, Partially supervised clustering for image segmentation, Pattern Recognition 29(5) (1996), 859–871.

Mavroeidis

, Accelerating spectral clustering with partial supervision, Data Mining and Knowledge Discovery 21(2) (2010), 241–258.

Lelis

and Sander

, Semi-supervised Density-Based Clustering, Proceeding of IEEE International Conference on Data Mining, 2009, pp. 842–847.

Ruiz

Spiliopoulou

and Menasalvas

, Density-based semi-supervised clustering, Data Mining and Knowledge Discovery 21(3) (2010), 345–370.

Böhm

and Plant

, HISSCLU: A hierarchical density-based method for semi-supervised clustering, Proceedings of the 11th international conference on Extending database technology: Advances in database technology (EDBT’08), 2008, pp. 440–451.

Settles

, Active learning literature survey, Computer Sciences Technical Report 1648, University of WisconsinMadison, 2010.

10.

V.-V.

and Labroche

, Active seed selection for constrained clustering, Intelligent Data Analysis 21(3) (2017), 537–552.

11.

V.-V.

, An efficient semi-supervised graph based clustering, Intelligent Data Analysis 22(2) (2018).

12.

V.-V.

Labroche

and Bouchon-Meunier

, Improving constrained clustering with active query selection, Pattern Recognition 45(4) (2012), 1749–1758.

13.

V.-V.

Labroche

and Bouchon-Meunier

, Active learning for semi-supervised k-means clustering, Proc. 22nd IEEE International Conference on Tools with Artificial Intelligence, 2010, pp. 12–15.

14.

V.-V.

and Do

H.-Q.

, Density-based clustering with side information and active learning, The 9th International Conference on Knowledge and Systems Engineering, 2017, pp. 174–179.

15.

Basu

Banerjee

and Mooney

R.J.

, Active Semi-Supervision for Pairwise Constrained Clustering, in: Proceedings of the 2004 SIAM International Conference on Data Mining (SDM), 2004, pp. 333–344.

16.

Abin

A.A.

and Beigy

, Active selection of clustering constraints: a sequential approach, Pattern Recognition 47(3) (2014), 1443–1458.

17.

Abin

A.A.

and Beigy

, Active constrained fuzzy clustering: A multiple kernels learning approach, Pattern Recognition 48(3) (2015), 953–967.

18.

Ester

Kriegel

H.-P.

Sander

and Xu

, A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise, in: Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD), 1996, pp. 226–231.

19.

Lichman

, UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science, 2013.

20.

Rand

W.M.

, Objective criteria for the evaluation of clustering methods, Journal of the American Statistical Association, American Statistical Association 66(336) (1971), 846–850.

21.

Mehrabian

, Communication without Words, Psychology Today 1(2) (1968), 53–56.

22.

Ekman

and Friesen

W.V.

, Constants across Cultures in the Face and Emotion, Journal of Personality and Social Psychology 17(2) (1971), 124–129.

23.

Viola

and Jones

M.J.

, Robust real-time face detection, International Journal of Computer Vision 57 (2004), 137–154.

24.

Shih

F.Y.

and Chuang

C.-F.

, Automatic extraction of head and face boundaries and facial features, Information Science 158 (2004), 117–130.

25.

Daugman

, How Iris Recognition Works, IEEE Transactions on Circuits and Systems for Video Technology 14(1) (2004), 21–30.

26.

Fasel

and Luettin

, Automatic facial expression analysis: A Survey, Pattern Recognition 36(1) (2003), 259–275.

27.

Štruc

and Pavešic

, The Complete Gabor-Fisher Classifier for Robust Face Recognition, EURASIP Journal on Advances in Signal Processing, 2010.

28.

Jolliffe

I.T.

, Principal component analysis, Springer-Verlag, Berlin, 1986.