An Abused Webpage Detection Method Based on Screenshots Text Recognition.

Yan-Ming Huang,Dongjie Liu,Zhiwei Yan,Yan-Ming Zhang,Guang-Gang Geng
DOI: https://doi.org/10.1145/3491396.3506562
2021-01-01
Abstract:With the rapid development of the Internet, webpages containing abused information such as pornography and gambling have emerged in an endless stream. These webpages are using various methods to evade traditional detection methods and which seriously make the Internet environment worse. Thus, how to accurately identify these webpages are becoming more and more significant. In response to this problem, by combining text recognition and text classification, this paper proposes an abused webpage detection method based on screenshots, which can efficiently detect and classify webpages by acquiring the user's real visible webpage information. Also, this paper uses the traditional web crawler method to conduct a comparative experiment, and the accuracy and the advantage of the method have been verified. This work will provide technical support for fighting against illegal activities and purifying the Internet environment.
What problem does this paper attempt to address?