DTOF-ANN: An Artificial Neural Network phishing detection model based on Decision Tree and Optimal Features

Erzhou Zhu,Yinyin Ju,Zhile Chen,Feng Liu,Xianyong Fang
DOI: https://doi.org/10.1016/j.asoc.2020.106505
IF: 8.7
2020-10-01
Applied Soft Computing
Abstract:<p>Recently, phishing emerges as one of the biggest threats to human's daily networking environments. Phishing attackers disguise illegal URLs as normal ones to steal user's private information with the social engineering techniques, such as emails and SMS, which calls for an effective method of preventing phishing attacks to relieve the loss by them. Neural networks can be used to detect and prevent phishing attacks because of their strong active learning abilities from massive datasets and high accuracy in data classification. However, duplicate points in the public datasets and negative and useless features in the feature vectors will trap the training of the neural networks into the problem of over-fitting, which will make the trained classifier weak when detect phishing websites. This paper proposes DTOF-ANN (Decision Tree and Optimal Features based Artificial Neural Network) to tackle this shortcoming, which is a neural-network phishing detection model based on decision tree and optimal feature selection. First, the traditional K-medoids clustering algorithm is improved with an incremental selection of initial centers to remove the duplicate points from the public datasets. Then, an optimal feature selection algorithm based on the new defined feature evaluation index, decision tree and local search method is designed to prune out the negative and useless features. Finally, the optimal structure of the neural network classifier is constructed through properly adjusting parameters and trained by the selected optimal features. Experimental results have demonstrated that DTOF-ANN exhibits higher performance than many of the existing methods.</p>
computer science, artificial intelligence, interdisciplinary applications
What problem does this paper attempt to address?