Unsupervised active learning techniques for labeling training sets: An experimental evaluation on sequential data

Abstract

Many real-world applications, such as those related to sensors, allow collecting large amounts of inexpensive unlabeled sequential data. However, the use of supervised machine learning methods is frequently hindered by the high costs involved in gathering labels for such data. These methods assume the availability of a considerable amount of labeled data to build an accurate classification model. To overcome this bottleneck, active learning methods are designed to selectively label the most informative examples instead of requesting all true labels. Although active learning has been widely used in many problems, most of the methods consider the presence of labeled data or some prior knowledge about the problem, as the number of classes. Differently, in this paper, we are interested in the realistic scenario where the active learning is performed from scratch on a fully unlabeled dataset and with the absence of any classifier or prior knowledge about the data. In general, the methods that consider fully unlabeled data use random sampling to select examples to label. The goal of this work is to show a broad experimental evaluation with different unsupervised active learning methods to select examples from fully unlabeled sequential data. We evaluated methods based on clustering algorithms and centrality measures from graphs for instance selection and the performance of supervised and semi-supervised learning algorithms in the classification task. Given our evaluation on a benchmark of sequential data and in a case study of insect species classification, we indicated the sampling based on hierarchical clustering or k-Means. These methods present a statistically significantly better performance to the popular random sampling. In addition, they are simple algorithms and readily available in many software packages.

Keywords

Unsupervised active learning training set labeling clustering centrality measures sequential data

Get full access to this article

View all access options for this article.

References

Amancio

D.R.

, Probing the topological properties of complex networks modeling short written texts, PloS one 10(2) (2015), e0118394.

Angluin

, Queries revisited, Theoretical Computer Science 313(2) (2004), 175–194.

Araujo

and Zhao

, Detecting and labeling representative nodes for network-based semi-supervised learning, in: Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2013, pp. 1729–1736.

Bagnall

and Lines

, An experimental evaluation of nearest neighbour time series classification, Technical Report CMP-C14-01, Department of Computer Science, University East Anglia, Norwich, United Kingdom, 2014.

Baram

El-Yaniv

and Luz

, Online choice of active learning algorithms, Journal of Machine Learning Research 5 (2004), 255–291.

Batista

G.E.A.P.A.

Keogh

E.J.

Tataw

O.M.

and Souza

V.M.A.

, CID: an efficient complexity-invariant distance for time series, Data Mining and Knowledge Discovery 28(3) (2014), 634–669.

Breve

F.A.

Zhao

Quiles

M.G.

Pedrycz

and Liu

, Particle competition and cooperation in networks for semi-supervised learning, IEEE Transactions on Knowledge and Data Engineering 24(9) (2012), 1686–1698.

Brin

and Page

, The anatomy of a large-scale hypertextual web search engine, Computer Networks 30 (1998), 107–117.

Chadwick

L.E.

and Williams

C.M.

, The effects of atmospheric pressure and composition on the flight of drosophila, The Biological Bulletin 97(2) (1949), 115–137.

10.

Chapelle

Schölkopf

and Zien

, Semi-Supervised Learning, MIT Press, 2006.

11.

Chen

and Ng

, On the marriage of lp-norms and edit distance, in: Proceedings of the International Conference on Very large data bases (VLDB), 2004, pp. 792–803.

12.

Chen

Özsu

M.T.

and Oria

, Robust and fast similarity search for moving object trajectories, in: Proceedings of the International Conference on Management of data (SIGMOD), 2005, pp. 491–502.

13.

Chen

Keogh

Begum

Bagnall

Mueen

and Batista

, The ucr time series classification archive, July 2015. www.cs.ucr.edu/ẽamonn/time_series_data/.

14.

de Sousa

C.A.R.

Rezende

S.O.

and Batista

G.E.A.P.A.

, Influence of graph construction on semi-supervised learning, in: Proceedings of the European Conference Machine Learning and Knowledge Discovery in Databases (ECML/PKDD), 2013, pp. 160–175.

15.

de Sousa

C.A.R.

Souza

V.M.A.

and Batista

G.E.A.P.A.

, Time series transductive classification on imbalanced data sets: An experimental study, in: Proceedings of the International Conference on Pattern Recognition (ICPR), 2014, pp. 3780–3785.

16.

de Sousa

C.A.R.

Souza

V.M.A.

and Batista

G.E.A.P.A.

, An experimental analysis on time series transductive classification on graphs, in: Proceedings of the International Joint Conference on Neural Networks (IJCNN), 2015, pp. 1–8.

17.

Delalleau

Bengio

and Le-Roux

, Efficient non-parametric function induction in semi-supervised learning, in: Proceedings of the International Workshop on Artificial Intelligence and Statistics (AISTATS), 2005, pp. 96–103.

18.

Dempster

A.P.

Laird

N.M.

and Rubin

D.B.

, Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society 39 (1977), 1–38.

19.

Demšar

, Statistical comparisons of classifiers over multiple data sets, The Journal of Machine Learning Research 7 (2006), 1–30.

20.

Ding

Trajcevski

Scheuermann

Wang

and Keogh

, Querying and mining of time series data: Experimental comparison of representations and distance measures, Proceedings of the VLDB Endowment 1(2) (2008), 1542–1552.

21.

Ester

Kriegel

Sander

and Xu

, A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), 96 (1996), pp 226–231.

22.

Faloutsos

Ranganathan

and Manolopoulos

, Fast subsequence matching in time-series databases, in: Proceedings of the International Conference on Management of Data (SIGMOD), 1994, pp. 1–11.

23.

Fernández

García

del Jesus

M.J.

and Herrera

, A study of the behaviour of linguistic fuzzy rule based classification systems in the framework of imbalanced data-sets, Fuzzy Sets and Systems 159(18) (2008), 2378–2398.

24.

Frentzos

Gratsias

and Theodoridis

, Index-based most similar trajectory search, in Proceedings of the International Conference on Data Engineering (ICDE), 2007, pp. 816–825.

25.

Zhu

and Li

, A survey on instance selection for active learning, Knowledge and Information Systems 35(2) (2013), 249–283.

26.

Fukunaga

and Hostetler

, The estimation of the gradient of a density function, with applications in pattern recognition, IEEE Transactions on Information Theory 21(1) (1975), 32–40.

27.

Hoi

S.C.H.

Jin

Zhu

and Lyu

M.R.

, Batch mode active learning and its application to medical image classification, in: Proceedings of the International Conference on Machine Learning (ICML), 2006, pp. 417–424.

28.

Mac-Namee

and Delany

S.J.

, Off to a good start: Using clustering to select the initial training set in active learning, in: Proceedings of the International Florida Artificial Intelligence Research Society Conference (FLAIRS), 2010, pp. 26–31.

29.

W.M.

Xie

and Maybank

, Unsupervised active learning based on hierarchical graph-theoretic clustering, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 39(5) (2009), 1147–1161.

30.

Jain

A.K.

, Data clustering: 50 years beyond k-means, Pattern Recognition Letters 31(8) (2010), 651–666.

31.

Kang

Ryu

K.R.

and Kwon

, Using cluster-based sampling to select initial training set for active learning in text classification, in: Advances in Knowledge Discovery and Data Mining, Springer 2004, pp. 384–388.

32.

Keogh

and Ratanamahatana

C.A.

, Exact indexing of dynamic time warping, Knowledge and Information Systems 7(3) (2005), 358–386.

33.

Keogh

Wei

Lee

and Vlachos

, Lb_keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures, in: Proceedings of the International Conference on Very Large Data Bases (VLDB), 2006, pp. 882–893.

34.

Kim

Smyth

and Luther

, Modeling waveform shapes with random effects segmental hidden markov models, in: Proceedings of the conference on Uncertainty in artificial intelligence (UAI), 2004, pp. 309–316.

35.

Klein

D.J.

, Centrality measure in graphs, Journal of Mathematical Chemistry 47(4) (2010), 1209–1223.

36.

Lewis

D.D.

and Gale

W.A.

, A sequential algorithm for training text classifiers, in: Proceedings of the International ACM Conference on Research and Development in Information Retrieval (SIGIR), 1994, pp. 3–12.

37.

Lughofer

, Hybrid active learning for reducing the annotation effort of operators in classification systems, Pattern Recognition 45(2) (2012), 884–896.

38.

Macskassy

S.A.

, Using graph-based metrics with empirical risk minimization to speed up active learning on networked data, in: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2009, pp. 597–606.

39.

Mandel

M.I.

Poliner

G.E.

and Ellis

D.P.W.

, Support vector machine active learning for music retrieval, Multimedia Systems 12(1) (2006), 3–13.

40.

Mellanby

, Humidity and insect metabolism, Nature 138 (1936), 124–125.

41.

Nanopoulos

Alcock

and Manolopoulos

, Feature-based classification of time-series data, International Journal of Computer Research 10(3) (2001), 49–61.

42.

Newman

, Networks: An Introduction, Oxford University Press, Inc., 2010.

43.

Nguyen

H.T.

and Smeulders

, Active learning using pre-clustering, in: Proceedings of the International Conference on Machine Learning (ICML), 2004, p. 79.

44.

Reynolds

, Gaussian mixture models, In Encyclopedia of Biometrics, Springer 2015, pp. 827–832.

45.

Rodriguez

and Laio

, Clustering by fast search and find of density peaks, Science 344(6191) (2014), 1492–1496.

46.

Rodríguez

J.J.

and Alonso

C.J.

, Interval and dynamic time warping-based decision trees, in: Proceedings of the ACM Symposium on Applied Computing (SAC), 2004, pp. 548–552.

47.

Rodríguez

J.J.

Alonso

C.J.

and Boström

, Learning first order logic time series classifiers: Rules and boosting, in: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML/PKDD), 2000, pp. 299–308.

48.

Rossi

R.G.

Lopes

A.A.

and Rezende

S.O.

, Optimization and label propagation in bipartite heterogeneous networks to improve transductive classification of texts, Information Processing & Management 52(2) (2016), 217–257.

49.

Roy

and McCallum

, Toward optimal active learning through sampling estimation of error reduction, in: Proceedings of the International Conference on Machine Learning (ICML), 2001, pp. 441–448.

50.

Saito

P.T.M.

Suzuki

C.T.N.

Gomes

J.F.

de Rezende

P.J.

and Falcão

A.X.

, Robust active learning for the diagnosis of parasites, Pattern Recognition, 2015.

51.

Settles

, Active learning literature survey, University of Wisconsin, Madison, 2010, p. 65.

52.

Settles

and Craven

, An analysis of active learning strategies for sequence labeling tasks, in: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2008, pp. 1070–1079.

53.

Settles

Craven

and Friedland

, Active learning with real annotation costs, in: Proceedings of the NIPS Workshop on Cost-Sensitive Learning, 2008, pp. 1–10.

54.

Settles

Craven

and Ray

, Multiple-instance active learning, in: Advances in Neural Information Processing Sytems, 2008, pp. 1289–1296.

55.

Seung

H.S.

Opper

and Sompolinsky

, Query by committee, in: Proceedings of the Workshop on Computational Learning Theory, 1992, pp. 287–294.

56.

Shokoohi-Yekta

Wang

and Keogh

, On the Non-Trivial Generalization of Dynamic Time Warping to the Multi-Dimensional Case, in: Proceedings of the SIAM International Conference on Data Mining (SDM), 2015, pp. 289–297.

57.

Silva

D.F.

Souza

V.M.A.

and Batista

G.E.A.P.A.

, Time series classification using compression distance of recurrence plots, in: Proceedings of the International Conference on Data Mining (ICDM), 2013, pp. 687–696.

58.

Silva

D.F.

Souza

V.M.A.

Batista

G.E.A.P.A.

Keogh

and Ellis

D.P.W.

, Applying machine learning and audio analysis techniques to insect recognition in intelligent traps, in: Proceedings of the International Conference on Machine Learning and Applications (ICMLA), Vol. 1, 2013, pp. 99–104.

59.

Silva

D.F.

Souza

V.M.A.

Ellis

D.P.W.

Keogh

E.J.

and Batista

G.E.A.P.A.

, Exploring low cost laser sensors to identify flying insect species, Journal of Intelligent & Robotic Systems (2014), 1–18.

60.

Souza

V.M.A.

Rossi

R.G.

Rezende

S.O.

and Batista

G.E.A.P.A.

, Online supplementary material, http://sites.labic.icmc.usp.br/vsouza/SM_IDA.pdf, 2016.

61.

Souza

V.M.A.

Silva

D.F.

and Batista

G.E.A.P.A.

, Classification of data streams applied to insect recognition: Initial results, in: Proceedings of the Brazilian Conference on Intelligent Systems (BRACIS), 2013, pp. 76–81.

62.

Taylor

L.R.

, Analysis of the effect of temperature on insects in flight, The Journal of Animal Ecology (1963), 99–117.

63.

Terasawa

Slaney

and Berger

, The thirteen colors of timbre, in: Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2005, pp. 323–326.

64.

Tong

and Koller

, Support vector machine active learning with applications to text classification, Journal of Machine Learning Research 2 (2002), 45–66.

65.

Vlachos

Kollios

and Gunopulos

, Discovering similar multidimensional trajectories, in: Proceedings of the International Conference on Data Engineering (ICDE), 2002, pp. 673–684.

66.

Ward

J.H.

, Jr., Hierarchical grouping to optimize an objective function, Journal of the American Statistical Association 58(301) (1963), 236–244.

67.

and Chang

E.Y.

, Distance-function design and fusion for sequence data, in: Proceedings of the ACM international conference on Information and knowledge management (CIKM), 2004, pp. 324–333.

68.

Keogh

Shelton

Wei

and Ratanamahatana

C.A.

, Fast time series classification using numerosity reduction, in: Proceedings of the International Conference on Machine learning (ICML), 2006, pp. 1033–1040.

69.

Yan

Yang

and Hauptmann

, Automatically labeling video data using multi-class active learning, in: Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2003, pp. 516–523.

70.

Yuan

Han

Guan

Lee

and Lee

, Initial training data selection for active learning, in: Proceedings of the International Conference on Ubiquitous Information Management and Communication (IMCOM), 2011, p. 5.

71.

Zhen

Liu

and Chi

, On the importance of components of the mfcc in speech and speaker recognition, Acta Scientiarum Naturalium 37(3) (2001), 371–378.

72.

Zhou

Bousquet

, Lal

Weston

and Schölkopf

, Learning with local and global consistency, Advances in Neural Information Processing Systems 16(16) (2004), 321–328.

73.

Zhu

Wang

Yao

and Tsou

B.K.

, Active learning with sampling by uncertainty and density for word sense disambiguation and text classification, in: Proceedings of the International Conference on Computational Linguistics (COLING), 2008, pp. 1137–1144.

74.

Zhu

, Semi-supervised learning literature survey, Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005.

75.

Zhu

Ghahramani

and Lafferty

, Semi-supervised learning using gaussian fields and harmonic functions, in: Proceedings of the International Conference on Machine Learning (ICML), Vol. 3, 2003, pp. 912–919.

76.

Zhu

and Goldberg

A.B.

, Introduction to Semi-Supervised Learning, Morgan and Claypool Publishers, 2009.

77.

Zhu

Zhang

Lin

and Shi

, Active learning from data streams, in: Proceedings of the International Conference on Data Mining (ICDM), 2007, pp. 757–762.