A Malicious Web Page Detection Model based on SVM Algorithm: Research on the Enhancement of SVM Efficiency by Multiple Machine Learning Algorithms

Jingbing Chen,Jie Yuan,Yuewei Li,Yiqi Zhang,Yufan Yang,Ruiqi Feng
DOI: https://doi.org/10.1145/3446132.3446183
2020-01-01
Abstract:In recent years, due to the high availability and convenience of the Internet, more and more information provides corresponding services through the Internet. As people are increasingly relying on the Internet, network security issues have become increasingly prominent, and a large number of malicious web pages have also emerged. How to achieve proactive and efficient detection of malicious web pages has become a research focus in the field of network security worldwide. This paper uses the Support Vector Machine algorithm to realize autonomous learning and build the classifier; chooses the TF-IDF method to process the data, and obtains the feature matrix of the collected URL data, which is stored in the sparse matrix after normalization and standardization. To avoid the existence of relatively strong features from affecting the classification results of the classifier, the K-Means method and TruncatedSVD method are used to reduce the dimension of the data features. The linear kernel function is used for large samples, and the Gaussian kernel function is used for small samples, so that the performance of the classifier is optimal. In the training process, the grid search method is used to obtain the optimal parameters forming a complete and mature detection system. And a ten-fold cross-validation method is used to test the correct rate, recall rate, accuracy rate and F1 value of the classifier. Finally the experimental result shows the malicious web page detection model has a good reference for big data processing.
What problem does this paper attempt to address?