Abstract
Phishing emails continue to be an important threat to cybersecurity, necessitating the use of robust detection systems. On a large-scale dataset, this paper provides a holistic methodology for detecting phishing emails that combines NLP approaches with ensemble learning methods. AdaBoost, XGBoost, Gradient Boosting Machine (GBM), Light Gradient Boosting Machine (LGBM), CatBoost, Extra Trees, and Random Forest are among the ensemble approaches, along with the Stacking and Majority Voting procedures. The experimental results reveal that the Stacking ensemble obtained amazing performance, consisting of 98.89% accuracy, precision, recall, and F-measure, with unusually low FPR and FNR of 0.01 for each example. Similarly, the Majority Voting ensemble obtained remarkable results with an accuracy of 98.56%, precision, recall, and F-measure of 98.56%, FPR as 0.02, and FNR as 0.01. These findings highlight the capabilities of modern ensemble approaches to successfully detect phishing emails with high accuracy and low error rates. Combining natural language processing for feature extraction with complex ensemble models offers a viable method for combating phishing attacks in real-world applications.
Keywords
Get full access to this article
View all access options for this article.
