Safe semi-supervised classification algorithm combined with active learning sampling strategy

Abstract

In order to improve the performance of semi-supervised learning, a kind of safe semi-supervised classification algorithm based active learning sampling strategy is proposed. First, an active learning sampling method based on uncertainty and representativenes is designed. The weighted algorithm combining the uncertainty and representativenesss is used to select the unlabeled samples with rich information and representation, providing for semi-supervised learning. Second, a method of label prediction based on grouping verification is designed. Prelabeling is executed on unlabeled sample selected by active learning. The sample with pseudo-label is added into the labeled sample set to carry out grouping, training and testing. The corresponding errors of various pseudo-labels are calculated and the pseudo-label making the accuracy least is selected as the candidate label of the unlabeled sample. Third, a method of security verification is designed. Only the label making the accuracy lower than before is selected as the final label of the unlabeled sample to expand the number of labeled samples. Iterations are repeatedly executed until a certain precision is met. Finally, the classifier is trained using the final labeled set. The experiments are carried out on semi-supervised datasets and UCI datasets, and the results show that the proposed algorithms are effective.

Keywords

Active learning sample semi-supervised learning safety label prediction grouping verification

Get full access to this article

View all access options for this article.

References

Zhou

Z.H.

and Li

, Semi-supervised learning by disagreement, Knowledge & Information Systems24(3) (2010), 415–439.

Zhu

and Goldberg

, Introduction to semi-supervised learning, Morgan & Claypool37(1) (2014), 3036–3036.

Settles

, Active learning, Synthesis Lectures on Artificial Intelligence and Machine Learning6(1) (2012), 1–114.

Criminisi

, Shotton

and Konukoglu

, Decision forests: A unified framework for classification, regression, density estimation, manifold learning and semi-supervised learning, Foundations and Trends^® in Computer Graphics and Vision7(2–3) (2012), 81–227.

Balcan

, Berlind

, Ehrlich

et al., Efficient semi-supervised and active learning of disjunctions, Proceedings of the 30th International Conference on Machine Learning2013, pp. 633–641.

Reitmaier

, Calma

and Sick

, Transductive active learning – A new semi-supervised learning approach based on iteratively refined generative models to capture structure in data, Information Sciences293(293) (2015), 275–298.

Zhang

, Wen

, Wang

et al., Semi-supervised learning combining co-training with active learning, Expert Systems with Applications41(5) (2014), 2372–2378.

, Li

and Zhao

, An uncertainty and density based active semi-supervised learning scheme for positive unlabeled multivariate time series classification, Knowledge-Based Systems124 (2017), 80–92.

Chang

C.C.

and Lin

P.Y.

, Active learning for semi-supervised clustering based on locally linear propagation reconstruction, Neural Networks63 (2015), 170–184.

10.

Leng

, Xu

and Qi

, Combining active learning and semi-supervised learning to construct SVM classier, Knowledge-Based Systems44 (2013), 121–131.

11.

Hajmohammadi

M.S.

, Ibrahim

, Selamat

et al., Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples, Information Sciences317(C) (2015), 67–77.

12.

Angluin

, Queries and concept learning, Machine Learning2(4) (1988), 319–342.

13.

Settles

, Active learning literature survey, University of Wisconsin, Madison52 (2010), 55–66.

14.

Tuia

, Volpi

, Copa

et al., A survey of active learning algorithms for supervised remote sensing image classification, Selected Topics in Signal Processing, IEEE Journal of5(3) (2011), 606–617.

15.

Balcan

M.F.

, Hanneke

and Vaughan

J.W.

, The true sample complexity of active learning, Machine Learning80(2-3) (2010), 111–139.

16.

Huang

S.-J.

, Jin

and Zhou

Z.-H.

, Active learning by querying informative and representative examples, IEEE Transactions on Pattern Analysis and Machine Intelligence36(10) (2014), 1936–1949.

17.

Wang

, Meng

, Fu

et al., Towards Safe semi-supervised Classification: Adjusted Cluster Assumption via Clustering, Neural Processing Letters46(3) (2017), 1–12.

18.

C.J.

and Yang

Y.P.

, A batch-mode active learning SVM method based on semi-supervised clustering, Intelligent Data Analysis19(2) (2015), 345–358.

19.

Chen

and Wang

, Semi-supervised learning via regularized boosting working on multiple semi-supervised assumptions, IEEE Transactions on Pattern Analysis & Machine Intelligence33(1) (2010), 129–143.

20.

Niyogi

, Manifold regularization and semi-supervised learning: Some theoretical analyses, Journal of Machine Learning Research14(1) (2013), 1229–1250.

21.

Bao

B.K.

, Min

, Li

et al., Joint local and global consistency on interdocument and interword relationships for co-clustering, IEEE Transactions on Cybernetics45(1) (2014), 15–28.

22.

Blum

and Mitchell

, Combining labeled and unlabeled data with co-training, Conference on Computational Learning Theory1998, pp. 92–100.

23.

Jiang

, Zhang

and Zeng

, A hybrid generative/discriminative method for semi-supervised classification, Knowledge-Based Systems37(2) (2013), 137–145.

24.

Zhuang

, Zhou

, Gao

et al., Label information guided graph construction for semi-supervised learning, IEEE Transactions on Image Processing26(9) (2017), 4182–4192.

25.

Appice

, Guccione

and Malerba

, Transductive hyperspectral image classification: Toward integrating spectral and relational features via an iterative ensemble system, Machine Learning103(3) (2016), 343–375.

26.

Ding

, Zhu

and Zhang

, An overview on semi-supervised support vector machine, Neural Computing & Applications28 (2017), 1–10.

27.

Zhu

, Wang

, Ma

et al., Active learning with sampling by uncertainty and density for data annotations, IEEE Transactions on Audio Speech & Language Processing18(6) (2010), 1323–1331.

28.

Shen

, Mi

and Zhang

, A positioning lockholes of container corner castings method based on image recognition, Polish Maritime Research24(SI) (2017), 95–101.

29.

, Liu

, Furuta

and Peng

, Characteristics of activated carbon remove sulfur particles against smog, Saudi Journal of Biological Sciences24(6) (2017), 1370–1374.

30.

Yilmaz

A.E.

and Aktas

, Ridit and exponential type scores for estimating the kappa statistic, Kuwait Journal of Science45(1) (2018), 89–99.

31.

Chen

, Mean square exponential stability of uncertain singular stochastic systems with discrete and distributed delays, Journal of Interdisciplinary Mathematics20(1) (2017), 13–26.

32.

Feng

, Shu-Rong

, Hui

, Hong-Ping

and Jian-Zhong

, An empirical research on the influence of Chinese rural financial reform on cultivation of new agricultural business entities, Journal of Discrete Mathematical Sciences & Cryptography20(1) (2017), 389–405.

33.

Sun

, Varankina

V.I.

and Sadovaya

V.V.

, Didactic aspects of the academic discipline “history and methodology of mathematics”, Eurasia Journal of Mathematics Science and Technology Education13(7) (2017), 2923–2940.

34.

Plata

S.A.

and Sáez

S.B.

, After notes on Chebyshev’s iterative method, Applied Mathematics and Nonlinear Sciences2 (2017), 1–12.