Feature selection for incomplete set-valued data

Abstract

Set-valued data is a significant kind of data, such as data obtained from different search engines, market data, patients’ symptoms and behaviours. An information system (IS) based on incomplete set-valued data is called an incomplete set-valued information system (ISVIS), which generalized model of a single-valued incomplete information system. This paper gives feature selection for an ISVIS by means of uncertainty measurement. Firstly, the similarity degree between two information values on a given feature of an ISVIS is proposed. Then, the tolerance relation on the object set with respect to a given feature subset in an ISVIS is obtained. Next, λ-reduction in an ISVIS is presented. What’s more, connections between the proposed feature selection and uncertainty measurement are exhibited. Lastly, feature selection algorithms based on λ-discernibility matrix, λ-information granulation, λ-information entropy and λ-significance in an ISVIS are provided. In order to better prove the practical significance of the provided algorithms, a numerical experiment is carried out, and experiment results show the number of features and average size of features by each feature selection algorithm.

Keywords

Rough set theory ISVIS feature selection similarity degree algorithm

Get full access to this article

View all access options for this article.

References

Blaszczynski

, Slowinski

and Szelag

, Sequential covering rule induction algorithm for variable consistency rough set approaches, Information Sciences 181(5) (2011), 987–1002.

Cornelis

, Jensen

, Martin

G.H.

and Slezak

, Attribute selection with fuzzy decision reducts, Information Sciences 180 (2010), 209–224.

Chen

Z.C.

and Qin

K.Y.

, Attribute reduction of set-valued information systems based on a tolerance relation, Computer Science 23(1) (2010), 18–22.

Dai

J.H.

and Tian

H.W.

, Entropy measures and granularity measures for set-valued information systems, Information Sciences 240 (2013), 72–82.

Dai

J.H.

and Tian

H.W.

, Fuzzy rough set model for set-valued data, Fuzzy Sets and Systems 229 (2013), 54–68.

Dai

J.H.

, Wang

W.T.

and Mi

J.S.

, Uncertainty measurement for interval-valued information systems, Information Sciences 251 (2013), 63–78.

Dai

J.H.

, Wang

W.T.

, Xu

and Tian

H.W.

, Uncertainty measurement for interval-valued decision systems based on extended conditional entropy, Knowledge-Based Systems 27 (2012), 443–450.

Q.H.

, Pedrycz

, Yu

D.R.

and Lang

, Selecting discrete and continuous features based on neighborhood decision error minimization, IEEE transactions on systems, man and cybernetics (Part B) 40 (2010), 137–150.

Izonin

, Tkachenko

, Kryvinska

, Zub

, Mishchuk

and Lisovych

, Recovery of incomplete IoT sensed data using high-performance extended-input neural-like structure, Procedia Computer Science 160 (2019), 521–526.

10.

Izonin

, Tkachenko

, Verhun

and Zub

, An approach towards missing data management using improved GRNN-SGTM ensemble method, Engineering Science and Technology, an International Journal 24 (2021), 749–759.

11.

, Deng

S.B.

, Feng

S.Z.

and Fan

J.P.

, Fast assignment reduction in inconsistent incomplete decision systems, Journal of Systems Engineering and Electronics 25(1) (2014), 83–94.

12.

Lang

G.M.

, Li

Q.G.

and Yang

, An incremental approach to attribute reduction of dynamic set-valued information systems, International Journal of Machine Learing and Cybernetics 5 (2014), 775–788.

13.

, Shang

C.X.

, Feng

S.Z.

and Fan

J.P.

, Quick attribute reduction in inconsistent decision tables, Information Sciences 254 (2014), 155–180.

14.

Liu

and Zhong

, Attribute reduction of set-valued decision information system based on dominance relation, Journal of Interdisciplinary Mathematics 19(3) (2016), 469–479.

15.

Pawlak

, Rough sets: Theoretical aspects of reasoning about data, Kluwer Academic Publishers, Dordrecht, 1991.

16.

Qian

Y.H.

, Dang

C.Y.

, Liang

J.Y.

and Tang

D.W.

, Set-valued ordered information systems, Information Sciences 179 (2009), 2809–2832.

17.

Sakai

, Nakata

and Slezak

, A prototype system for rule generation in Lipski¡¯s incomplete information databases, in: Proceedings of 13th Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, 2011, pp. 175–182.

18.

Singh

, Shreevastava

, Som

and Somani

, A fuzzy similarity-based rough set approach for attribute selection in set-valued information systems, Soft Computing 24 (2020), 4675–4691.

19.

Song

X.X.

and Zhang

W.X.

, Knowledge reduction in set-valued decision information system, Rough Sets & Current Trends in Computing Proceedings 7260(1) (2009), 348–357.

20.

Tkachenko

, Izonin

, Kryvinska

, Dronyuk

and Zub

, An approach towards increasing prediction accuracy for the recovery of missing IoT data based on the GRNN-SGTM ensemble, Sensors 20 (2020), 2625.

21.

Wang

and Gao

, Knowledge reduction of set-valued decision information systems based on tolerance relation, Applied Mechanics and Materials 462 (2014), 466–471.

22.

Xie

N.X.

, Liu

, Li

Z.W.

and Zhang

G.Q.

, New measures of uncertainty for an interval-valued information system, Information Sciences 470 (2019), 156–174.

23.

Zhang

, Mei

, Chen

and Li

, Multi-confidence rule acquisition and confidence-preserved attribute reduction in interval-valued decision systems, International Journal of Approximate Reasoning 55(8) (2014), 1787–1804.

24.

Zhi

H.L.

, Zhi

D.J.

and Liu

Z.T.

, Research on conversion from conjunctive normal form to disjunctive normal form, Computer Engineering and Applications 48(2) (2012), 15–17.