Comparative Analysis of Black-Box and White-Box Machine Learning Model in Phishing Detection

Abdullah Fajar,Setiadi Yazid,Indra Budi
2024-12-03
Abstract:Background: Explainability in phishing detection model can support a further solution of phishing attack mitigation by increasing trust and understanding how phishing can be detected. Objective: The aims of this study to determine and best recommendation to apply an approach which has several components with abilities to fulfil the critical needs Methods: A methodology starting with analyzing both black-box and white-box models to get the pros and cons specifically in phishing detection. The conclusion of the analysis will be validated by experiment using a set of well-known algorithms and public phishing datasets. Experimental metrics covers 3 measurements such as predictive accuracy and explainability metrics. Conclusion: Both models are comparable in terms of interpretability and consistency, with room for improvement in diverse datasets. EBM as an example of white-box model is generally better suited for applications requiring explainability and actionable insights. Finally, each model, white-box and black-box model has positive and negative aspects both for performance metric and for explainable metric. It is important to consider the objective of model usage.
Cryptography and Security,Artificial Intelligence
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the trade - off problem between the explainability of the model and the prediction accuracy in phishing attack detection. Specifically, the research objectives are: 1. **Determine the advantages and disadvantages of black - box and white - box models in phishing detection**: By comparing and analyzing black - box models (such as deep neural networks, random forests, etc.) and white - box models (such as decision trees, logistic regression, etc.), understand their performance in phishing detection. 2. **Provide the best recommendation scheme**: According to the analysis results, select the most appropriate model type for practical applications to meet the key requirements, especially in application scenarios where explainability is required. 3. **Enhance the understanding and trust of phishing attacks**: By improving the explainability of the model, increase users' trust in the phishing detection system and help users better understand how phishing attacks are detected. 4. **Evaluate the performance and explainability of different models**: Use public datasets and a series of known algorithms for experimental verification, and evaluate the performance of models in terms of prediction accuracy and explainability. 5. **Explore the application of Explainable Artificial Intelligence (XAI) techniques**: Research how to use XAI techniques (such as LIME, SHAP, etc.) to explain the decision - making process of black - box models, thereby enhancing their transparency and credibility. ### Research background Phishing is a common means of cybercrime, which brings serious security threats to individuals and organizations. Machine learning models are widely used in phishing attack detection, but the decision - making processes inside many models (especially black - box models) are difficult to explain, which limits their application in high - risk areas. Therefore, improving the explainability of models has become an important research direction. ### Main objectives - **Improve the transparency of the phishing detection system**: By explaining the decision - making process of the model, enable users to understand why a certain website or email is considered a phishing attack. - **Enhance user trust**: By providing clear explanations, make users trust the results of the detection system more. - **Optimize model selection**: According to specific requirements (such as prediction accuracy, explainability, etc.), recommend the most suitable model type for practical applications. ### Methods - **Analyze black - box and white - box models**: Evaluate the advantages and disadvantages of different types of models from multiple dimensions (such as prediction accuracy, explainability, consistency, etc.). - **Experimental verification**: Use public datasets for experiments to verify the performance of different models, and ensure the reliability of the results through statistical tests. - **Apply XAI techniques**: For black - box models, use techniques such as LIME and SHAP to generate explanations and improve their transparency. ### Conclusions Through the comparative analysis of black - box and white - box models, the research found that: - **EBM (Explainable Boosting Machine)**, as an example of a white - box model, is usually more suitable for application scenarios that require explainability and operability. - **Each model has its own advantages and disadvantages**: They have their own characteristics in terms of performance and explainability, and the specific choice should be determined according to the requirements of the application scenario. In general, this paper emphasizes the importance of the explainability of the model in phishing detection for improving user trust and system reliability.