Junk Post Filtering in Web Forums

Chen Lin,Wei Wang
2009-01-01
Journal of Computer Research and Development
Abstract:Web forums have emerged to Web users as new platforms for information sharing and group collaboration.With the large volume of accumulated knowledge,Web forums have become valuable sources for data mining in recent years.However,the performance of those data mining applications is usually harmed by low-quality-posts in Web forums.In this paper,the problem of filtering lowquality-posts(junk posts)in Web forums is focused on.Inspired by LDA,graphic models,clustered junk topic model and clustered author junk topic model,are built for detecting junk posts.Text contents,reply linkages and author information are utilized in the presented models.In contrast to traditional approaches such as classification methods,these models require no training and no hard coded rule sets.The presented models can help to not only understand the process of generating posts but also quantitatively evaluate the quality of post contents in Web forums.Experiments conducted on a real Web forum show that this approach achieves better results compared with traditional methods.
What problem does this paper attempt to address?