Improved LLE and neighborhood rough sets-based gene selection using Lebesgue measure for cancer classification on gene expression data

Abstract

Gene selection as an important data preprocessing technique for cancer classification is one of the most challenging issues in the field of microarray data analysis. In this paper, to deal with gene expression data more effectively, a locally linear embedding (LLE) and neighborhood rough sets-based gene selection method using Lebesgue measure for cancer classification is proposed. First, to solve the problems that the traditional LLE method cannot effectively identify category information, and is susceptible to noise pollution and other issues, the intra-class neighborhood is defined and a new method of calculating reconstruction weight is proposed by combining with the Euclidean distance to improve LLE. Then, the Lebesgue measure is introduced into neighborhood rough sets, a δ -neighborhood measure is defined, and the dependency degree and the significance measure are presented in neighborhood decision systems. Finally, an improved LLE and neighborhood rough sets-based gene selection algorithm is designed, where the improved LLE algorithm is used to reduce the initial dimensions of gene expression data and obtain a candidate gene subset, and the Lebesgue measure and dependency degree-based relative reduction for gene expression data is developed to further screen the candidate subset to select the final gene subset. The experimental results under several public gene expression data sets prove that the proposed method is effective for selecting the most relevant genes with high classification accuracy.

Keywords

Rough sets neighborhood rough sets gene selection locally linear embedding cancer classification

Get full access to this article

View all access options for this article.

References

Jain ,

V.K.

Jain and

Jain , Correlation feature selection based improved-binary particle swarm optimization for gene selection and cancer classification, Applied Soft Computing 62 (2018), 203–215.

J.X.

Liu ,

Xu ,

C.H.

Zheng ,

Kong and

Z.H.

Lai , RPCA-based tumor classification using gene expression data, IEEE/ACM Transactions on Computational Biology and Bioinformatics 12 (2015), 964–970.

Wan and

A.A.

Freitas , An empirical evaluation of hierarchical feature selection methods for classification in bioinformatics datasets with gene ontology-based features, Artificial Intelligence Review 50(2) (2018), 201–240.

Sun ,

X.Y.

Zhang ,

Y.H.

Qian ,

J.C.

Xu ,

S.G.

Zhang and

Tian , Joint neighborhood entropy-based gene selection method with fisher score for tumor classification, Applied Intelligence (2018). DOI: 10.1007/s10489–018–1320-1

Elyasigomari ,

M.S.

Mirjafari ,

H.R.C.

Screen and

M.H.

Sha-heed , Cancer classification using a novel gene selection approach by means of shuffling based on data clustering with optimization, Applied Soft Computing 35 (2015), 43–51.

Sina ,

Ali ,

Reza and

Parham , Gene selection for microarray data classification using a novel ant colony optimization, Neurocomputing 168 (2015), 1024–1036.

Sun ,

J.C.

Xu and

Yin , Principal component-based feature selection for tumor classification, Bio-Medical Materials and Engineering 26 (2015), 2011–2017.

C.Z.

Wang ,

Q.H.

Hu ,

X.Z.

Wang ,

D.G.

Chen and

Y.H.

Qian , Feature selection based on neighborhood discrimination index, IEEE Transactions on Neural Networks and Learning Systems 29(7) (2018), 2986–2999.

C.Z.

Wang ,

He ,

M.W.

Shao and

Q.H.

Hu , Feature selection based on maximal neighborhood discernibility, International Journal of Machine Learning and Cybernetics 9(11) (2018), 1929–1940.

10.

Min ,

Z.H.

Zhang and

Dong , Ant colony optimization with partial-complete searching for attribute reduction, Journal of Computational Science 25 (2018), 170–182.

11.

Sun ,

J.C.

Xu and

Tian , Feature selection using rough entropy-based uncertainty measures in incomplete decision systems, Knowledge-Based Systems 36 (2012), 206–216.

12.

Feng ,

J.C.

Xu and

T.H.

Xu , An efficient gene selection technique based on self-organizing map and particle swarm optimization, Journal of Intelligent & Fuzzy Systems 33(6) (2017), 3287–3294.

13.

VanderPlas and

Connolly , Reducing the dimensionality of data: Locally linear embedding of sloan galaxy spectra, The Astronomical Journal 138(5) (2009), 1365–1379.

14.

De Ridder ,

Kouropteva ,

Okun ,

Pietikinen and

R.P.W.

Duin , Supervised locally linear embedding. In:

Kaynak ,

Alpaydin ,

Oja ,

Xu , eds., Artificial Neural Networks and Neural Information Processing-ICANN/ICONIP 2003, Springer, Berlin, Heidelberg, Lecture Notes in Computer Science 2714 (2003), 333–341.

15.

Sun ,

R.N.

Liu ,

J.C.

Xu ,

S.G.

Zhang and

Tian , An affinity propagation clustering method using hybrid kernel function with LLE, IEEE Access 6 (2018), 68892–68909.

16.

Liu ,

D.G.

Tosun and

M.W.

Weiner , Locally linear embedding (LLE) for MRI based Alzheimer's disease classification, NeuroImage 83 (2013), 148–157.

17.

Z.Q.

Su ,

B.P.

Tang ,

J.H.

Ma and

Deng , Fault diagnosis method based on incremental enhanced supervised locally linear embedding and adaptive nearest neighbor classifier, Measurement 48 (2014), 136–148.

18.

Sun ,

J.C.

Xu ,

Wang and

Yin , Locally linear embedding and neighborhood rough set-based gene selection for gene expression data classification, Genetics and Molecular Research 15(3) (2016), gmr.15038990.

19.

J.C.

Xu ,

H.Y.

Mu ,

Wang and

F.Z.

Huang , Feature genes selection using supervised locally linear embedding and correlation coefficient for microarray classification, Computational and Mathematical Methods in Medicine 2018 (2018), Article ID 5490513.

20.

Y.Y.

Yao , Relation interpretation of neighborhood operators and rough set approximation operators, Information Sciences 195 (1998), 239–259.

21.

W.Z.

Wu and

W.X.

Zhang , Neighborhood operator systems and approximations, Information Sciences 144 (2002), 201–217.

22.

Wang ,

Y.H.

Qian ,

X.Y.

Liang ,

Guo and

J.Y.

Liang , Local neighborhood rough set, Knowledge-Based Systems 153 (2018), 53–64.

23.

Q.H.

Hu ,

D.R.

Yu ,

J.F.

Liu and

C.X.

Wu , Neighborhood rough set based heterogeneous feature subset selection, Information Sciences 178 (2008), 3577–3594.

24.

Meng ,

Zhang and

Y.S.

Luan , Gene selection integrated with biological knowledge for plant stress response using neighborhood system and rough set theory, IEEE/ACM Transactions on Computational Biology and Bioinformatics 12 (2015), 433–444.

25.

Liu ,

W.L.

Huang ,

Y.L.

Jiang and

Z.Y.

Zeng , Quick attribute reduct algorithm for neighborhood rough set model, Information Sciences 271 (2014), 65–81.

26.

Sun ,

X.Y.

Zhang ,

J.C.

Xu and

Wang , A Gene selection approach based on the Fisher linear discriminant and the neighborhood rough set, Bioengineered 9(1) (2018), 144–151.

27.

Y.M.

Chen ,

Z.J.

Zhang ,

J.Z.

Zheng ,

Ma and

Xue , Gene selection for tumor classification using neighborhood rough sets and entropy measures, Journal of Biomedical Informatics 67 (2017), 59–68.

28.

H.Y.

Mu ,

J.C.

Xu ,

Wang and

Sun , Feature genes selection using Fisher transformation method, Journal of Intelligent & Fuzzy Systems 34(6) (2018), 4291–4300.

29.

Sun and

J.C.

Xu , Information entropy and mutual information-based uncertainty measures in rough set theory, Applied Mathematics & Information Sciences 8(4) (2014), 1973–1985.

30.

Yang ,

Li ,

Hu ,

Gao and

Wang , Multimode process monitoring based on geodesic distance, International Journal of Software Engineering and Knowledge Engineering 28(9) (2018), 1225–1248.

31.

P.R.

Halmos , Measure Theory, World Publishing Corporation, 2007, pp. 100–152.

32.

Y.X.

Lang ,

Zheng and

Xing , An effective gene selection method for cancer classification based on locally linear embedding, Journal of Computational and Theoretical Nanoscience 8(10) (2011), 2108–2111.

33.

R.J.

Urbanowicz ,

Meeker ,

C.W.

La ,

R.S.

Olson and

J.H.

Moore , Relief-based feature selection: Introduction and review, Journal of Biomedical Informatics 85 (2018), 189–203.

34.

Yang ,

Y.L.

Liu ,

C.S.

Feng and

G.Q.

Zhu , Applying the Fisher score to identify Alzheimer's disease-related genes, Genetics and Molecular Research 15(2) (2016), gmr.15028798.

35.

S.F.

Zheng and

W.X.

Liu , An experimental comparison of gene selection by Lasso and Dantzig selector for cancer classification, Computers in Biology and Medicine 41(11) (2011), 1033–1040.

36.

Sun and

J.C.

Xu , Feature selection using mutual information based uncertainty measures for tumor classification, Bio-Medical Materials and Engineering 24(1) (2014), 763–770.

37.

Aziz ,

C.K.

Verma and

Srivastava , A fuzzy based feature selection from independent component subspace for machine learning classification of microarray data, Genomics Data 8 (2016), 4–15.

38.

Apolloni ,

Leguizamon and

Alba , Two hybrid wrapperfilter feature selection algorithms applied to high-dimensional microarray experiments, Applied Soft Computing 38 (2016), 922–932.