PDSMV3-DCRNN: A Novel Ensemble Deep Learning Framework for Enhancing Phishing Detection and URL Extraction
Y. Bhanu Prasad,D. Venkatesulu
DOI: https://doi.org/10.1016/j.cose.2024.104123
IF: 5.105
2024-09-20
Computers & Security
Abstract:Phishing is a cyber-attack that exploits victims' technical ignorance or naivety and commonly involves a Uniform Resources Locator (URL). As a result, it is beneficial to examine URLs before accessing them to spot a phishing assault. Several algorithms based on machine learning have been presented to detect phishing attempts. However, these approaches often suffer from lower performance outcomes, such as lower accuracy, longer response times, and higher false positive rates. Furthermore, many existing methods rely heavily on predefined feature sets, which may limit their adaptability and robustness. In contrast, our proposed method leverages a more dynamic feature selection process, which includes the Conditional Wasserstein Generative Adversarial Network (CWGAN) for addressing data imbalance and the Binary Grey Goose Optimization Algorithm (BGGOA) for optimal feature selection. This dynamic approach enhances the model's ability to adapt to varying data characteristics, improving detection performance. The proposed solution is divided into two stages: pre-deployment and deployment. During the pre-deployment stage, the dataset is preprocessed, including data transformation, handling irrelevant and redundant data, and ensuring data balancing. Minority samples are increased using CWGAN to avoid class imbalance. Features are then selected using BGGOA, resulting in a feature-reduced dataset used for training and testing ensemble deep learning classifiers, specifically the Novel Pyramid Depth-wise Separable-MobileNetV3 (PyDS-MV3) and Deformable Convolutional Residual Neural Network (DCRNN), termed PDSMV3-DCRNN. During the deployment phase, the Boosted ConvNeXt approach extracts URL features fed into the trained classifier to predict "phishing" or "benign". According to experimental findings, the proposed solution outperforms all other tested approaches, displaying a faster training time of 0.11 seconds and achieving an optimal accuracy of 99.21%.
computer science, information systems