Noise analysis for text-based spam images

Peng Li,Hanbing Yan,Gang Cui,Yuejin Du
2012-01-01
Journal of Information and Computational Science
Abstract:Traditional spam filters are facing more and more challenges with the rapid growth of image-based spam. Previous works have leveraged OCR techniques and text classifiers for image spam detection, which are time consuming and CPU intensive. In addition, OCR can be easily tricked by noise-and contentobscuring elements added by spammers. In this paper, we propose a novel approach aimed at detecting the "amount" and the "type" of noise due to the use of those techniques against OCR tools. Firstly, we propose a specific method for text region localization using steerable filter and morphological processing, which separates images into text-regions for OCR content extraction and background-regions for noise analysis. Next, wavelet transform is used for constructing noise feature image of the background-region, based on which noise measurement and classification can be completed. Experimental results show that our method can locate the text region accurately, and the results of noise analysis can effectively reflect the noise interference of spam images, which can be viewed as complementary to the approaches based on OCR tools for further reducing false positives of the image spam filters. © 2012 Binary Information Press.
What problem does this paper attempt to address?