MapReduce-based data mining algorithm in consumer behavior prediction

Abstract

In the era of complicated information, consumer behavior mining is of great significance to enterprises and markets, however, the current information processing technology is difficult to comprehensively mine and segment consumer data. The study aims to conduct an effective analysis of consumer behavior. And for the shortcomings of the current consumer behavior mining algorithms, the study proposes an improved consumer behavior data mining algorithm based on the map reduce model. After the experimental analysis, the results revealed that the research algorithm has the closest Mahalanobis Distances compared to the two algorithms, fuzzy C-means and density-based spatial clustering of application with noise, indicating that the research algorithm is more effective in clustering. The average clustering accuracy of K-means clustering algorithm (K-means) based on Andersori’s Iris data seto dataset was 93.2%, and the average clustering accuracy of the two datasets Glass and Wine was 94.3% and 93.8%, respectively. The research methodology categorized consumers into three classes based on their transaction frequency and transaction amount. Among the consumers in cluster 1, the total transaction amount was in the range of 0.62–0.82, the transaction frequency was between 0.41 and 0.72, and the number of transactions was between 0.72 and 0.94, which shows that the consumers in this cluster belonged to the group of moderately active and high consumption. The above data indicate that this method, through the collaborative optimization of MapReduce and K-means, has an accuracy rate of over 90% in cross-industry scenarios. It effectively solves the problems of low efficiency, poor accuracy and weak adaptability of traditional algorithms, providing a quantifiable technical solution for the research of consumer behavior.

Keywords

MapReduce data mining algorithms consumer behavior prediction clustering

Get full access to this article

View all access options for this article.

References

Purohit

Arora

Paul

. The bright side of online consumer behavior: continuance intention for mobile payments. J Consum Behav 2022; 21(3): 523–542.

Tian

Chen

Pan

. Research on the influence of product information on consumers’ willingness to shop online. Adv Ind Eng Manag. 2023; 12(1): 14–17.

Permata Gusti

. Digital commerce transformation: a study on the impact of E-money, E-wallet, and E-commerce use on consumer and financial behavior. Malays e commer j 2023; 7(1): 50–53.

Tahmid Abtahi

Shafique

Haque

, et al. Exploring consumer preferences: the significance of personalization in E-commerce. Malays e commer j 2024; 8(1): 01–07.

Ibtisum

Bazgir

Rahman

SMA

, et al. A comparative analysis of big data processing paradigms: mapreduce vs. Apache spark. World J Adv Res Rev 2023; 20(1): 1089–1098.

Lim

Park

. Improving Hadoop MapReduce performance on heterogeneous single board computer clusters. Future Gener Comput Syst 2024; 160: 752–766.

Kadali

Mohan

Naik

, et al. Crime data analysis using Naive Bayes classification and least square estimation with MapReduce. International Journal of Computational Methods and Experimental Measurements 2024; 12(3): 289–295.

Shang

Zhou

Fujita

. Energy-saving operation synergy for multiple metro-trains using MapReduce parallel optimization. IEEE Trans Veh Technol 2022; 71(2): 1319–1332.

Chen

Zhang

. An open source project for tuning and analyzing MapReduce performance in Hadoop and spark. IEEE Softw 2022; 39(1): 61–69.

10.

Wang

Xiang

. A multi-threaded particle swarm optimization-kmeans algorithm based on MapReduce[J]. Clust Comput 2024; 27(6):8031–8044.

11.

Kumar

Varshney

Bhatiya

, et al. Replication-based query management for resource allocation using Hadoop and MapReduce over big data. Big Data Min Anal 2023; 6(4): 465–477.

12.

Matrouk

Nalavade

Alhasen

, et al. MapReduce framework based sequential association rule mining with deep learning enabled classification in retail scenario. Cybern Syst 2025; 56(2): 147–169.

13.

Ghezelbash

Daviran

Maghsoudi

, et al. Incorporating the genetic and firefly optimization algorithms into K-means clustering method for detection of porphyry and skarn Cu-related geochemical footprints in Baft district, Kerman, Iran. Appl Geochem 2023; 148(3): 564–578.

14.

Swinney

Woods

. K-means clustering approach to UAS classification via graphical signal representation of radio frequency signals for air traffic early warning. IEEE Trans Intell Transport Syst 2022; 23(12): 24957–24965.

15.

Begum

. Big data analytics and its impact on customer behavior prediction in retail businesses. Pacific Journal of Business Innovation and Strategy 2024; 1(1): 49–59.

16.

Nio-Adan

Landa-Torres

Portillo

, et al. Influence of statistical feature normalisation methods on K-Nearest Neighbours and K-Means in the context of industry 4.0. Eng Appl Artif Intell 2022; 111(2): 104807–104827.

17.

Bhatia

Gupta

. Predictive analytics for customer behavior and sales forecasting in retail. International Journal of Web of Multidisciplinary Studies 2025; 2(1): 24–30.

18.

Kumar

Rani

Pippal

, et al. Customer segmentation in e-commerce: K-means vs hierarchical clustering. TELKOMNIKA 2025; 23(1): 119–128.

19.

Chen

. Classification with convolutional neural networks in mapreduce. J Comput Commun 2024; 12(8): 174–190.

20.

Rajeshkumar

Dhanasekaran

Vasudevan

. A novel three-factor authentication and optimal mapreduce frameworks for secure medical big data transmission over the cloud with shaxecc. Multimed Tool Appl 2024; 83(26): 68363–68391.

21.

Wang

Cao

, et al. An adaptive Kriging method based on K-means clustering and sampling in n-ball for structural reliability analysis. Eng Comput 2023; 40(2): 378–410.

22.

Gheisari

Hamidpour

Liu

, et al. Data mining techniques for web mining: a survey. Artif Intell Appl 2023; 1(1): 3–10.