Self-training algorithm based on density peaks combining globally adaptive multi-local noise filter

Abstract

Self-training algorithm highlights the speed of training a supervised classifier through small labeled samples and large unlabeled samples. Despite its long considerable success, self-training algorithm has suffered from mislabeled samples. Local noise filters are designed to detect mislabeled samples. However, two major problem with this kind of application are: (a) Current local noise filters have not treated the spatial distribution of the nearest neighbors in different classes in much detail. (b) They are being disadvantaged when mislabeled samples are located in overlapping areas of different classes. Here, we develop an integrated architecture – self-training algorithm based on density peaks combining globally adaptive multi-local noise filter (STDP-GAMLNF), to improve detecting efficiency. Firstly, the spatial structure of the data set is revealed by density peak clustering, and it is used for empowering self-training to label unlabeled samples. In the meantime, after each epoch of labeling, GAMLNF can comprehensively judge whether a sample is a mislabeled sample from multiple classes or not, and it will reduce the influence of edge samples effectively. The corresponding experimental results conducted on eighteen UCI data sets demonstrate that GAMLNF is not sensitive to the value of the neighbor parameter $k$ , and it is capable of adaptively finding the appropriate number of neighbors of each class.

Keywords

Self-training algorithm density peaks clustering noise filter

Get full access to this article

View all access options for this article.

References

et al., disentangled variational auto-encoder for semi-supervised learning, Information Sciences 482(12) (2019), 73–85.

Yuan

et al., semi-supervised stacked autoencoder-based deep hierarchical semantic feature for real-time fingerprint liveness detection, Journal of Real-Time Image Processing 17(1) (2020), 55–71.

Tran

V.C.

Nguyen

N.T.

and Fujita

, A combination of active learning and self-learning for named entity recognition on twitter using conditional random fields, Knowledge-Based Systems 132 (2017), 179–187.

Liu

et al., boosting semi-supervised face recognition with noise robustness, IEEE Transactions on Circuits and Systems for Video Technology 99 (2021), 10–18.

Zhang

et al., traditional Chinese medicine knowledge service based on semi-supervised BERT-BiLSTM-CRF model, in: 2020 International Conference on Service Science, 2020, pp. 64–69.

and Zhu

, semi-supervised self-training method based on an optimum-path forest, IEEE Access 7(3) (2019), 36388–36399.

Pande

and Awate

S.P.

, generative deep-neural-network mixture modeling with semi-supervised MinMax+EM learning, in: 25th International Conference on Pattern Recognition, 2021, pp. 5666–5673.

Slivka

et al., A tool for flexible experimenting with co-training based semi-supervised algorithms, Knowledge-Based Systems 121(1) (2017), 2–8.

Anis

et al., A sampling theory perspective of graph-based semi-supervised learning, IEEE Transactions on Information Theory 65(4) (2019), 2322–2342.

10.

Mateos-García

García-Gutieŕrez

and Riquelme-Santos

J.C.

, An evolutionary voting for k nearest neighbors, Expert Syst Appl 43 (2016), 9–14.

11.

Mitania

and Hamamoto

, A local mean-based nonparametric classifier, Pattern Recognit Lett 27(10) (2006), 1151–1159.

12.

Gou

et al., Improved pseudo nearest neighbor classifification, Knowl Based Syst 70 (2014), 361–375.

13.

Pan

et al., A new globally adaptive k-nearest neighbor classifier based on local mean optimization, Soft Computing 25(3) (2021), 1–15.

14.

Pan

Wang

and Ku

, A new general nearest neighbor classifification based on the mutual neighborhood information-a, Knowl Based Syst 121(1) (2017), 142–152.

15.

Pan

Wang

and Ku

, A new k-harmonic nearest neighbor classififier based on the multi-local means-b, Expert Syst Appl 67(2) (2017), 115–125.

16.

Adankon

M.M.

and Cheriet

, Help-Training for semi-supervised support vector machines, Pattern Recognition 44(9) (2011), 2220–2230.

17.

Gan

et al., using clustering analysis to improve semi-supervised classification, Neurocomputing 101(3) (2013), 290–298.

18.

et al., A self-training semi-supervised classification algorithm based on density peaks of data and differential evolution, in: IEEE 15th International Conference on Networking Sensing and Control, 2018, pp. 1–6.

19.

Wei

Wang

and Zhao

, Semi-supervised multi-label image classification based on nearest neighbor editing, Neurocomputing 119 (2013), 462–468.

20.

Wei

Yang

and Qiu

, Improving self-training with density peaks of data and cut edge weight statistic, Soft Computing 24 (2020), 15595–15610.

21.

Zhu

and Wu

, A self-training method based on density peaks and an extended parameter-free local noise filter for k nearest neighbor, Knowledge-Based Systems 184(3) (2019), 104–113.

22.

Triguero

et al., On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification, Neurocomputing 132 (2014), 30–41.

23.

Asuncion

and Newman

, UCI machine learning repository, 2007. Available: http://archive.ics.uci.edu/ml/datasets.php.

24.

, A self-training hierarchical prototype-based approach for semi-supervised classification, Information Sciences 535 (2020), 204–224.

25.

Shang

Luo

et al., Self-training semi-supervised classification based on density peaks of data, Neurocomputing 275 (2017), 180–191.

26.

Wilson

D.L.

, Asymptotic properties of nearest neighbor rules using edited data, IEEE Trans Syst Man Cybern 2(3) (1972), 408–421.

27.

Tomek

, An experiment with the edited nearest-neighbor rule, IEEE Trans Syst Man Cybern 6(6) (1967), 448–452.

28.

Hattori

and Takahashi

, A new edited k-nearest neighbor rule in the pattern classification problem, Pattern Recognit 33(3) (2000), 521–528.

29.

Pei

et al., A threshold-free classification mechanism in genetic programming for high-dimensional unbalanced classification, in: IEEE Congress on Evolutionary Computation, 2020, pp. 1–8.