Abstract:The advent of Internet technologies has resulted in the proliferation of electronic trading and the use of the Internet for electronic transactions, leading to a rise in unauthorized access to sensitive user information and the depletion of resources for enterprises. As a consequence, there has been a marked increase in phishing, which is now considered one of the most common types of online theft. Phishing attacks are typically directed towards obtaining confidential information, such as login credentials for online banking platforms and sensitive systems. The primary objective of such attacks is to acquire specific personal information to either use for financial gain or commit identity theft. Recent studies have been conducted to combat phishing attacks by examining domain characteristics such as website addresses, content on websites, and combinations of both approaches for the website and its source code. However, businesses require more effective anti-phishing technologies to identify phishing URLs and safeguard their users. The present research aims to evaluate the effectiveness of eight machine learning (ML) and deep learning (DL) algorithms, including support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF), Decision Tree (DT), Extreme Gradient Boosting (XGBoost), logistic regression (LR), convolutional neural network (CNN), and DL model and assess their performances in identifying phishing. This study utilizes two real datasets, Mendeley and UCI, employing performance metrics such as accuracy, precision, recall, false positive rate (FPR), and F-1 score. Notably, CNN exhibits superior accuracy, emphasizing its efficacy. Contributions include using purpose-specific datasets, meticulous feature engineering, introducing SMOTE for class imbalance, incorporating the novel CNN model, and rigorous hyperparameter tuning. The study demonstrates consistent model performance across both datasets, highlighting stability and reliability.

Unsupervised Clustering for a Comparative Methodology of Machine Learning Models to Detect Domain-Generated Algorithms Based on an Alphanumeric Features Analysis

Detection Method of Domain Names Generated by DGAs Based on Semantic Representation and Deep Neural Network

CNN-based DGA Detection with High Coverage

A Survey of Machine Learning and Deep Learning Based DGA Detection Techniques

Uit-DGAdetector: detect domains generated by algorithms using machine learning

Comparative evaluation of machine learning algorithms for phishing site detection

Detecting DGA domains with recurrent neural networks and side information

LLMs for Domain Generation Algorithm Detection

Advances in artificial intelligence for detecting algorithmically generated domains: Current trends and future prospects

The More, the Better? A Study on Collaborative Machine Learning for DGA Detection

Malware Analysis Using Machine Learning and Deep Learning Techniques

Malware Analysis and Detection Using Machine Learning Algorithms

Comparative Analysis of Intrusion Detection System Using Machine Learning and Deep Learning Algorithms

Domain-Embeddings Based DGA Detection with Incremental Training Method

An Assessment of Lexical, Network, and Content-Based Features for Detecting Malicious URLs Using Machine Learning and Deep Learning Models

Detecting Dictionary Based AGDs Based on Community Detection.

Bot Detection Using Unsupervised Machine Learning

Machine Learning-based Information Security Model for Botnet Detection

Evaluation of Machine Learning Algorithms for Malware Detection

Illegal Domain Name Generation Algorithm Based on Character Similarity of Domain Name Structure

DGGCN: Dictionary Based DGA Detection Method Based on DomainGraph and GCN