Interpretative Topic Categorization Via Deep Multiple Instance Learning

Tong Yu,Meng Wang,Yanzhang Lv,Luguo Xue,Jun Liu
DOI: https://doi.org/10.1109/IJCNN.2018.8489395
2018-01-01
Abstract:Given a document stream, categorizing the topic of the documents while highlighting the key information is critical to improving user reading efficiency. Based on the practical application, previous work on topic categorization which only focus on classifying each document to a given category is not enough. It is significant to promote an interpretative topic categorization to extract the characteristic words in the documents to assist user reading. Based on the instances-bag relationship between words and the document, we propose a multiple instance learning method to tackle this problem. In addition, another problem in traditional topic categorization is the deficient word representation. In this paper, we use Bi-directional Long Short Memory (Bi-LSTM) and Convolutional neural network (CNN) to capture both the local sequential feature and the global context feature to form a comprehensive word representation to solve this problem. Finally, we thus design an effective model Interpretative Topic Categorization Model (ITCM) to exploit the MIL property with deep learning to classify the documents to the predefined topics and discover the characteristic words simultaneously. We conduct two groups of experiments and prove that ITCM not only achieves convincing performance on topic categorization, but also interpreting effectively which words characterize the document category without word-level supervised learning.
What problem does this paper attempt to address?