An efficient attribute reduction algorithm using MapReduce

Abstract

Classical attribute reduction algorithms based on attribute significance initiate too many jobs (O(|C|²)) when they run in MapReduce. To improve the efficiencies of these algorithms, we proposed a novel reduction algorithm. Instead of focusing on attribute significance, the notion of a core attribute was applied to construct a new heuristic reduction algorithm, and only |C| jobs were considered to obtain a reduct. The algorithm only included two basic operations: compare and sort. The latter was optimised using the shuffle mechanism in MapReduce, which provided an efficient sorting ability for big data. In particular, we connected jobs in an iterative form to transfer the processing result of the former job to the latter job. Finally, experimental results demonstrated that the proposed attribute reduction algorithm was efficient and significantly improved upon the classical algorithms in runtime and number of jobs.

Keywords

Attribute reduction MapReduce rough set shuffle mechanism sort technology

Get full access to this article

View all access options for this article.

References

Han

Liew

Hemert

et al. A generic parallel processing model for facilitating data mining and integration. Parallel Comput 2011; 37: 157–171.

Srinivasan

Faruquie

Joshi

Data and task parallelism in ILP using MapReduce. Mach Learn 2012; 86(1): 141–168.

Qian

Miao

Zhang

et al. Parallel attribute reduction algorithms using MapReduce. Inform Sciences 2014; 279: 671–690.

Guyon

Elisseeff

An introduction to variable and feature selection. J Mach Learn Res 2003; 3: 1157–1182.

Cercone

Learning in relational databases: a rough set approach. Comput Intell 1995; 11: 323–338.

Pawlak

Rough sets. Int J Comput Inf Sci 1982; 11: 341–356.

Pawlak

Rough sets: theoretical aspects of reasoning about data. Boston, MA: Kluwer Academic Publishers, 1991.

Yin

Gui

Yang

et al. Core set analysis in inconsistent decision tables. Inform Sciences 2013; 241: 138–147.

Shang

Feng

et al. Quick attribute reduction in inconsistent decision tables. Inform Sciences 2014; 254: 155–180.

10.

Miao

Zhao

Yao

et al. Relative reducts in consistent and inconsistent decision tables of the Pawlak rough set model. Inform Sciences 2009; 179: 4140–4150.

11.

Wang

Yang

DC.

Decision table reduction based on parallel symbiotic evolution. Chin J Comput 2003; 26(5): 630–635 (in Chinese).

12.

Liu

Yang

et al. A quick attribute reduction algorithm with complexity of max(O(|C||U|), O(|C|2|U/C|)). Chin J Comput 2006; 29(3): 611–615 (in Chinese).

13.

Yang

Algorithms based on general discernibility matrix for computation of a core and attribute reduction. Control Decis 2008; 23: 1049–1054.

14.

Yang

Chen

Liang

et al. Attribute reduction for massive data based on rough set theory and MapReduce. In: Yu

Greco

Lingras

et al. (eds) Rough set and knowledge technology (Lecture notes in computer science), vol. 6401. Berlin; Heidelberg: Springer, 2010, pp. 672–678.

15.

Wang

Lan

. Solving the attribute reduction problem with ant colony optimization. In: Peters

Skowron

Chan

C-C

et al. (eds) Transactions on rough sets XIII (Lecture notes in computer science), vol. 6499. Berlin; Heidelberg: Springer, 2011, pp. 240–259.

16.

Liang

Wang

Dang

et al. An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 2012; 53: 912–926.

17.

Ryu

KH.

MapReduce-based web mining for prediction of web-user navigation. J Inf Sci 2014; 40(5): 557–567.

18.

Onan

Classifier and feature set ensembles for web page classification. J Inf Sci 2016; 42(2): 150–165.

19.

Shon

Han

Kim

et al. Proposal reviewer recommendation system based on big data for a national research management institute. J Inf Sci 2017; 43(2): 147–158.

20.

Yin

Gao

A flexible aggregation framework on large-scale heterogeneous information networks. J Inf Sci 2017; 43(2): 186–203.

21.

Qian

Yue

XD.

Incremental attribute reduction algorithm for big data using MapReduce. J Comput Method Sci Eng 2016; 16: 641–652.

22.

Shang

Fast approximate attribute reduction with MapReduce. In: Lingras

Wolski

Cornelis

et al. (eds) Rough sets and knowledge technology (Lecture notes in computer science), vol. 8171. Berlin; Heidelberg: Springer, 2013, pp. 271–278.

23.

Wang

Quick knowledge reduction based on divide and conquer method in huge data sets. In: Ghosh

Pal

(eds) Pattern recognition and machine intelligence: second international conference (Lecture notes in computer science), vol. 4815. Berlin; Heidelberg: Springer, 2007, pp. 312–315.

24.

Qian

Yue

et al. Hierarchical attribute reduction algorithms for big data using MapReduce. Knowl-Based Syst 2015; 73: 18–31.

25.

Qian

Miao

Zhang

et al. Hybrid approaches to attribute reduction based on indiscernibility and discernibility relation. Int J Approx Reason 2011; 52(2): 212–230.

26.

Apache Hadoop, https://hadoop.apache.org/

27.

Baskaya

Keskustalo

Järvelin

Effectiveness of search result classification based on relevance feedback. J Inf Sci 2013; 39(6): 764–772.

28.

Carmagnola

Osborne

Torre

Escaping the Big Brother: an empirical study on factors influencing identification and information leakage on the Web. J Inf Sci 2014; 40(2): 180–197.

29.

Corbellini

Mateos

Godoy

et al. An architecture and platform for developing distributed recommendation algorithms on large-scale social networks. J Inf Sci 2015; 41(5): 686–704.

30.

White

. Hadoop: the definitive guide. 2015. Beijing, China: Tsinghua University Press.