Abstract
Big Data generally refers to massive quantities of data that are difficult to manipulate and process using traditional data management tools. Hadoop is a solution that introduces new techniques for processing, storing, and managing these large data sets in a cluster of machines or nodes. MapReduce is one of the core components of Hadoop that promotes parallel processing of workloads by dividing them into smaller tasks that will be distributed across multiple nodes. The scheduling technique adopted can have a significant effect on the data locality rate over the cluster and the overall performance of MapReduce. In this paper, we present a hybrid scheduling algorithm that combines the use of locality index and dynamic job priority techniques when distributing tasks among nodes to improve the performance of Hadoop MapReduce, by rising the data locality rate and reducing the processing time of workloads. Experiment results showed that our proposed algorithm achieved better processing time and high locality rate compared to the default Hadoop schedulers FIFO, FAIR and Capacity while ensuring efficient resource utilization.
Keywords
Get full access to this article
View all access options for this article.
