Topic Detection in Instant Messages

Han Zhang,Chang-Dong Wang,Jian-Huang Lai
DOI: https://doi.org/10.1109/ICMLA.2014.41
2014-01-01
Abstract:In the past few years, instant messaging (IM) has been widely used in daily communication. However, due to the dispersion of topics and meaningless chatting, online IM groups are filled with useless messages. In order to help IM users capture what the IM group is talking about without reading all the messages, topic discovery in instant messages becomes a significant but challenging research task. In this paper, we propose a new method for topic detection in instant messages, which is applicable for the case where 1) useless terms keep emerging, 2) the instant messages are very short, and 3) multiple languages are used. The basic step is to treat each message in an online group discussion as a data item in message stream, and then apply PLSA on the collected instant messages. One strategy is designed to segment multilingual message without utilizing machine translation and remove the useless words that keep emerging. Extensive experiments conducted on the real world QQ group data confirm the effectiveness of the proposed method.
What problem does this paper attempt to address?