Porn2Vec: A Robust Framework for Detecting Pornographic Websites Based on Contrastive Learning.
Jun Zhao,Minglai Shao,Hao Peng,Hong Wang,Bo Li,Xudong Liu
DOI: https://doi.org/10.1016/j.knosys.2021.107296
IF: 8.139
2021-01-01
Knowledge-Based Systems
Abstract:Pornographic websites have become one of the largest origins spreading vulgar contents, which seriously threaten the mental and physical health of juveniles. Unfortunately, the existing pornography detection approaches are ineffective against the pornographic websites, which are armed with adversarial attack examples. In this paper, we propose Porn2Vec, a robust end-to-end framework for detecting pornographic websites using contrastive learning. Particularly, we first model pornographic websites with a heterogeneous graph consisting of websites, webpages, images, texts, and their interactive relationships, and formalize pornographic website detection into node classification task on the graph. Subsequently, we present a novel contrastive learning based heterogeneous graph embedding method to learn the high-level representation of websites by jointly aggregating image-based, text-based, and structure-based features. Finally, the learned website features are fed into a neural network to train an automatic model for pornographic website detection. Experimental results show that Porn2Vec outperforms the existing state-of-the-art methods, demonstrating a more promising and robust performance for detecting well-disguised pornographic websites equipped with adversarial attack examples.