Malicious URL Detection with Feature Extraction Based on Machine Learning

Baojiang Cui,Shanshan He,Xi Yao,Peilin Shi
DOI: https://doi.org/10.1504/ijhpcn.2018.10015545
2018-01-01
International Journal of High Performance Computing and Networking
Abstract:Many web applications suffer from various web attacks due to the lack of awareness concerning security. Therefore, it is necessary to improve the reliability of web applications by accurately detecting malicious URLs. In previous studies, keyword matching has always been used to detect malicious URLs, but this method is not adaptive. In this paper, statistical analyses based on gradient learning and feature extraction using a sigmoidal threshold level are combined to propose a new detection approach based on machine learning techniques. Moreover, the naïve Bayes, decision tree and SVM classifiers are used to validate the accuracy and efficiency of this method. Finally, the experimental results demonstrate that this method has a good detection performance, with an accuracy rate above 98.7%. In practical use, this system has been deployed online and is being used in large-scale detection, analysing approximately 2 TB of data every day.
What problem does this paper attempt to address?