Abstract:In malicious URLs detection, traditional classifiers are challenged because the data volume is huge, patterns are changing over time, and the correlations among features are complicated. Feature engineering plays an important role in addressing these problems. To better represent the underlying problem and improve the performances of classifiers in identifying malicious URLs, this paper proposed a combination of linear and non-linear space transformation methods. For linear transformation, a two-stage distance metric learning approach was developed: first, singular value decomposition was performed to get an orthogonal space, and then a linear programming was used to solve an optimal distance metric. For nonlinear transformation, we introduced Nyström method for kernel approximation and used the revised distance metric for its radial basis function such that the merits of both linear and non-linear transformations can be utilized. 33,1622 URLs with 62 features were collected to validate the proposed feature engineering methods. The results showed that the proposed methods significantly improved the efficiency and performance of certain classifiers, such as k-Nearest Neighbor, Support Vector Machine, and neural networks. The malicious URLs’ identification rate of k-Nearest Neighbor was increased from 68% to 86%, the rate of linear Support Vector Machine was increased from 58% to 81%, and the rate of Multi-Layer Perceptron was increased from 63% to 82%. We also developed a website to demonstrate a malicious URLs detection system which uses the methods proposed in this paper. The system can be accessed at: http://url.jspfans.com.

A Malicious Web Page Detection Model based on SVM Algorithm: Research on the Enhancement of SVM Efficiency by Multiple Machine Learning Algorithms

A Malicious Web Page Detection Model Based on SVM Algorithm

Detection Method of Computer Worms Based on SVM

Detecting Malicious Domains Using Modified SVM Model

Malicious URL Detection with Feature Extraction Based on Machine Learning

A Dynamic and Static Combined Android Malicious Code Detection Model Based on SVM.

An efficient SVM-Based method to detect malicious attacks for web servers

Research on Web Spam Detection Based on Support Vector Machine

Malicious Web Page Detection Based on On-Line Learning Algorithm

Study of the Web Spam Detection Based on the Support Vector Machine

Finding Effective Classifier for Malicious URL Detection

Phishing Detection System Based on SVM Active Learning Algorithm

Web Intrusion Detection System Combined with Feature Analysis and SVM Optimization

A Comparative Evaluation of Ensemble Classifiers for Malicious Webpage Detection

A Detection Method for Network Security Based on the Combination of Support Vector Machine

Intrusion Detection Model Based on Improved Support Vector Machine

TSMWD: A High-Speed Malicious Web Page Detection System Based on Two-Step Classifiers

Web Spam Detection Using Multiple Kernels in Twin Support Vector Machine

Detection of Malicious Websites Using Machine Learning Techniques

Malware Analysis and Detection Using Machine Learning Algorithms

Improving Malicious URLs Detection Via Feature Engineering: Linear and Nonlinear Space Transformation Methods