Abstract:Malicious webpage is developed or manipulated to be used as attack tool where it is considered as one of the main reasons of Internet criminal activities. Thus, it is essential to detect such webpages and prevent end users form accessing it. The conventional malicious webpages detection techniques are based on searching through a blacklist that contains a list of webpages classified as malicious from the perspective of users. However, these techniques have high false-negative rates especially with aforesaid sophisticated attacks due to technical and computational limitations. Hence, machine learning techniques have been employed to classify webpages by systemically analyzing set of features that reflect the characteristics of a malicious webpage. This paper compares the prediction accuracy of several machine learning classification algorithms and ensemble techniques. A data set of 5000 instances of URLs, with 189 different features are used in the comparative study. The results show that the most accurate classification technique in MultiBoost and Adaboost is Support Vector Machine (SVM), while K-Nearest Neighbor (k-NN) technique in bagging and random subspace.

A Comparative Evaluation of Ensemble Classifiers for Malicious Webpage Detection