Abstract:Phishing is a cyber-attack that exploits victims' technical ignorance or naivety and commonly involves a Uniform Resources Locator (URL). As a result, it is beneficial to examine URLs before accessing them to spot a phishing assault. Several algorithms based on machine learning have been presented to detect phishing attempts. However, these approaches often suffer from lower performance outcomes, such as lower accuracy, longer response times, and higher false positive rates. Furthermore, many existing methods rely heavily on predefined feature sets, which may limit their adaptability and robustness. In contrast, our proposed method leverages a more dynamic feature selection process, which includes the Conditional Wasserstein Generative Adversarial Network (CWGAN) for addressing data imbalance and the Binary Grey Goose Optimization Algorithm (BGGOA) for optimal feature selection. This dynamic approach enhances the model's ability to adapt to varying data characteristics, improving detection performance. The proposed solution is divided into two stages: pre-deployment and deployment. During the pre-deployment stage, the dataset is preprocessed, including data transformation, handling irrelevant and redundant data, and ensuring data balancing. Minority samples are increased using CWGAN to avoid class imbalance. Features are then selected using BGGOA, resulting in a feature-reduced dataset used for training and testing ensemble deep learning classifiers, specifically the Novel Pyramid Depth-wise Separable-MobileNetV3 (PyDS-MV3) and Deformable Convolutional Residual Neural Network (DCRNN), termed PDSMV3-DCRNN. During the deployment phase, the Boosted ConvNeXt approach extracts URL features fed into the trained classifier to predict "phishing" or "benign". According to experimental findings, the proposed solution outperforms all other tested approaches, displaying a faster training time of 0.11 seconds and achieving an optimal accuracy of 99.21%.

Feature Extraction or Feature Selection for Text Classification: A Case Study on Phishing Email Detection

CLDA: Feature Selection for Text Categorization Based on Constrained LDA

Text Mining for Phishing E-mail Detection

Highly Discriminative Features for Phishing Email Classification by SVD

Fuzzy Rough Set Feature Selection to Enhance Phishing Attack Detection

Modeling Suspicious Email Detection using Enhanced Feature Selection

Phishing Email Detection Based on Binary Search Feature Selection

Efficient Method for Feature Selection in Text Classification

Improving Phishing Email Detection Using the Hybrid Machine Learning Approach

Feature Extraction and Selection Hybrid Algorithm for Hyperspectral Imagery Classification

Phishing URL Detection using Machine Learning

On the Relationship Between Feature Selection and Classification Accuracy

Enhancing Phishing Detection through Feature Importance Analysis and Explainable AI: A Comparative Study of CatBoost, XGBoost, and EBM Models

A Classifier Model to Detect Phishing Emails Using Ensemble Technique

Phishing Attacks Detection -- A Machine Learning-Based Approach

A survey of feature selection and feature extraction techniques in machine learning

PDSMV3-DCRNN: A Novel Ensemble Deep Learning Framework for Enhancing Phishing Detection and URL Extraction

PDHF: Effective phishing detection model combining optimal artificial and automatic deep features

Can Features for Phishing URL Detection Be Trusted Across Diverse Datasets? A Case Study with Explainable AI

Phishing email detection using deep learning algorithms

An effective and efficient two-stage Dimensionality reduction algorithm for content-based spam filtering