Abstract
Classical attribute reduction algorithms based on attribute significance initiate too many jobs (O(|C|2)) when they run in MapReduce. To improve the efficiencies of these algorithms, we proposed a novel reduction algorithm. Instead of focusing on attribute significance, the notion of a core attribute was applied to construct a new heuristic reduction algorithm, and only |C| jobs were considered to obtain a reduct. The algorithm only included two basic operations: compare and sort. The latter was optimised using the shuffle mechanism in MapReduce, which provided an efficient sorting ability for big data. In particular, we connected jobs in an iterative form to transfer the processing result of the former job to the latter job. Finally, experimental results demonstrated that the proposed attribute reduction algorithm was efficient and significantly improved upon the classical algorithms in runtime and number of jobs.
Get full access to this article
View all access options for this article.
