XAIRF-WFP: a novel XAI-based random forest classifier for advanced email spam detection
Mohamed Aly Bouke,Omar Imhemed Alramli,Azizol Abdullah
DOI: https://doi.org/10.1007/s10207-024-00920-1
2024-11-01
International Journal of Information Security
Abstract:Spam detection is a critical cybersecurity and information management task with significant implications for security decision-making processes. Traditional machine learning algorithms such as Logistic Regression (LR), K-Nearest Neighbors (KNN), Decision Trees (DT), and Support Vector Machines (SVM) have been employed to mitigate this challenge. However, these algorithms often suffer from the "black box" dilemma, a lack of transparency that hinders their applicability in security contexts where understanding the reasoning behind classifications is essential for effective risk assessment and mitigation strategies. To address this limitation, the current paper leverages Explainable Artificial Intelligence (XAI) principles to introduce a novel, more transparent approach to spam detection. This paper presents a novel approach to spam detection using a Random Forest (RF) Classifier model enhanced by a meticulously designed methodology. The methodology incorporates data balancing through Hybrid Random Sampling, feature selection using the Gini Index, and a two-layer model explainability via Model-agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP) techniques. The model achieved an impressive accuracy rate of 94.8% and high precision and recall scores, outperforming traditional methods such as LR, KNN, DT, and SVM across all key performance metrics. The results affirm the effectiveness of the proposed methodology, offering a robust and interpretable model for spam detection. This study is a significant advancement in the field, providing a comprehensive and reliable solution to the spam detection problem.
computer science, information systems, theory & methods, software engineering