PhiUSIIL: A diverse security profile empowered phishing URL detection framework based on similarity index and incremental learning

Arvind Prasad,Shalini Chandra
DOI: https://doi.org/10.1016/j.cose.2023.103545
IF: 5.105
2023-10-22
Computers & Security
Abstract:With the proliferation of the World Wide Web and the increasing sophistication of cyber threats, phishing attacks have emerged as a significant concern for individuals and organizations alike. Phishing attacks, commonly executed through deceptive URLs, aim to deceive users into divulging sensitive information, leading to financial loss, identity theft, or compromising sensitive data. It continues to pose a significant threat to individuals and organizations in today's digital landscape, necessitating the development of effective and efficient detection frameworks. This article presents PhiUSIIL, a Phi shing U RL detection framework based on S imilarity I ndex and I ncremental L earning. The similarity index helps effectively identify visual similarity-based attacks such as zero-width characters, homograph, punycode, homophone, bit squatting, and combosquatting attacks. The incremental learning approach allows the framework to continuously update its knowledge base with new data. Further, implementing diverse security profiles accommodates diverse security requirements of users or organizations. PhiUSIIL extracts URL features, downloads the webpage from URL to extract HTML features, and derives new features from existing information to construct a phishing URL dataset, named PhiUSIIL phishing URL dataset, encompassing 134850 legitimate and 100945 phishing URLs. The proposed phishing URL detection framework has extensively experimented with the PhiUSIIL phishing URL dataset. The constructed dataset helps to improve the detection accuracy when used during pre-training approach. PhiUSIIL achieved an accuracy of 99.24% when experimented with a fully incremental training approach and 99.79% when experimented with a pre-training approach. The experimental results show its effectiveness and ensure the framework remains effective and up-to-date against emerging and sophisticated phishing techniques.
computer science, information systems
What problem does this paper attempt to address?