A Method for Detecting Phishing Websites Based on Tiny-Bert Stacking

Daojing He,Xin Lv,Shanshan Zhu,Sammy Chan,Kim-Kwang Raymond Choo
DOI: https://doi.org/10.1109/jiot.2023.3292171
IF: 10.6
2023-01-01
IEEE Internet of Things Journal
Abstract:Network and cyberspace security remains challenging in our increasingly connected worlds, and risks include those resulting from malicious cyber activities such as phishing. To combat phishing, this article proposes a phishing website detection model based on tiny-Bert stacking. The core concept is to use tiny-Bert to extract features from website URL strings, and learn the semantic features and long-range dependent features in URLs. Then, we build a Stacking algorithm-based classifier which includes four basic learners among which, CatBoost, XGBoost, and LightGBM are the first-level learners, and GBDT is the second-level learner. This detection model can identify phishing websites without manual feature extraction, and the basic learners of Stacking can compensate each other for errors in the classification process, improve generalization, and achieve higher accuracy. The proposed model is evaluated using a data set based on real phishing websites. Compared to the state-of-the-art, the results show that the proposed model has an accuracy rate of up to 99.14%, a recall rate of up to 99.13%, and is more stable.
What problem does this paper attempt to address?