On Phishing URLs Detection Using Feature Extension

Daojing He,Zhihua Liu,Xin Lv,Sammy Chan,Mohsen Guizani
DOI: https://doi.org/10.1109/jiot.2024.3446894
IF: 10.6
2024-01-01
IEEE Internet of Things Journal
Abstract:Phishing is a common cybercrime event with great harm. Various phishing attacks have occurred repeatedly and have caused huge economic losses. With the booming development of blockchain and cryptocurrency, the huge amount of money in the field and the immature ecosystem have induced phishing attacks to flood the field in large quantities. Unfortunately, phishing has become the main means of attack in the field, posing a huge security threat to users’ digital assets. The existing methods for detecting phishing websites rely on the quality of URL feature extraction, and the extraction angle is becoming increasingly rigid. Therefore, this paper proposes a phishing URL detection model that utilizes feature extension. This method uses the TextRank algorithm to generate a feature extension library and embeds the extracted features into the URL to be detected. After the URL is vectorized, it is input into the two-layer classification network proposed in this paper to classify the website. This classifier consists of an upstream task Bert layer and a downstream task CNN layer. It is possible to simultaneously learn the comprehensive representation information and local feature information of URLs, effectively avoiding overfitting problems and improving the ability to identify phishing websites. Comparative experiments are conducted using a dataset of real phishing websites. The experimental results show that this model has higher accuracy and stability compared to other phishing website detection models.
What problem does this paper attempt to address?