Knowledge-based Document Embedding for Cross-Domain Text Classification
Yiming Li,Baogang Wei,Liang Yao,Hui Chen,Zherong Li
DOI: https://doi.org/10.1109/ijcnn.2017.7966016
2017-01-01
Abstract:Cross-domain learning text classification aims to train an accurate model for a target domain by using labeled text data from a source domain with different but related data distributions. To narrow the data distribution gap between different domains, most of the previous approaches utilize the bag-of-words model to obtain latent features representation of the text. However, this kind of model loses the word order information and misses the background knowledge of the text. As the result, the conceptual information of the text is ignored to a big extent. In this paper, we propose a novel framework named Document Concept Vector for the cross-domain text classification which leverages both the neural network and the knowledge base in order to produce a high quality representation of the text. Specifically, a raw document is first transformed into a conceptualized document which consists of a set of concepts by utilizing a large taxonomy knowledge base. After that, the conceptualized document is transformed into a document vector through the neural network and the vector is used as the concept level feature of the original document. Finally, we conducted the experiments on two real-world corpora and compared it with both traditional classification algorithms and several state-of-the-art approaches of cross-domain text classification. The results validate the effectiveness of our framework.