Multimodal fraudulent website identification method based on heterogeneous model ensemble

Shengli Zhou,Linqi Ruan,Qingyang Xu,Mincheng Chen
DOI: https://doi.org/10.23919/jcc.fa.2022-0234.202305
2023-05-23
China Communications
Abstract:The feature analysis of fraudulent websites is of great significance to the combat, prevention and control of telecom fraud crimes. Aiming to address the shortcomings of existing analytical approaches, i.e. single dimension and venerability to anti-reconnaissance, this paper adopts the Stacking, the ensemble learning algorithm, combines multiple modalities such as text, image and URL, and proposes a multimodal fraudulent website identification method by ensembling heterogeneous models. Cross-validation is first used in the training of multiple largely different base classifiers that are strong in learning, such as BERT model, residual neural network (ResNet) and logistic regression model. Classification of the text, image and URL features are then performed respectively. The results of the base classifiers are taken as the input of the meta-classifier, and the output of which is eventually used as the final identification. The study indicates that the fusion method is more effective in identifying fraudulent websites than the single-modal method, and the recall is increased by at least 1%. In addition, the deployment of the algorithm to the real Internet environment shows the improvement of the identification accuracy by at least 1.9% compared with other fusion methods.
telecommunications
What problem does this paper attempt to address?