Adaptive Topic Modeling for Detection Objectionable Text

Jianping Zeng,Jiangjiao Duan,Chengrong Wu
DOI: https://doi.org/10.1109/wi-iat.2013.54
2013-01-01
Web Intelligence
Abstract:Objectionable text content on the Web is harmful to young children. Although keyword-based methods are superior in achieving faster detection, they fail to detect text content that is semantically objectionable. A novel framework based on adaptive topic modeling is proposed to detect objectionable text content. Firstly, a weighted graph is constructed based on several seed words and a set of training texts. Feature words are then selected from the graph according to the measure which shows how likely a word to be sensitive. Adaptive LDA (Latent Dirichlet Allocation) topic model in which topic number can be automatically estimated is proposed to find the latent objectionable topic structure for the text set. An objectionable topic criterion is devised for the adaptive selection method which takes the objectionable topic characteristic into consideration. Finally, detection for a given text is evaluated based on its probability value with respect to the model. Extensive comparison experiments on real world text sets show that the proposed method can effectively detect objectionable text. The performance is superior to that of keyword-based methods with several different approaches to generate keyword list. Experiments also show that the performance is better than that of detection methods based on traditional topic modeling.
What problem does this paper attempt to address?