Hierarchical Semantic Model for Objectionable Web Text Content Detection

Jiangjiao Duan,Jianping Zeng,Shiyong Zhang
DOI: https://doi.org/10.1109/icasid.2012.6325325
2012-01-01
Abstract:Objectionable Web text content becomes popular in many web sites on the Internet recently. Since it has been shown that the kind of text content is very harmful to young children, several measures have been taken to detect the objectionable text content. Unlike current methods, a scene-based method is proposed to recognize the objectionable text with aim at improving the performance, especially in the semantic detection. A scene which is defined by a set of sentences is assigned as the topics of objectionable content. Then, a hierarchical semantic model that can describe the scene from different granularity is learnt from the sentence set. Objectionable Web text detection is performed based on the similarity between the text and the model. Experiments are done on real world text sets which come from Web forums, and the results show that the proposed method can achieve better performance than that of keyword-based method with semantic feature selection. The ability in detecting semantic objectionable text is studied by varying several key parameters of the model.
What problem does this paper attempt to address?