Learning Semantic Topics for Domain-Adapted Textual Knowledge Transfer

Jie Fu,Shucheng Huang,Tianzhu Zhang,Changsheng Xu
DOI: https://doi.org/10.1145/3240876.3240922
2018-01-01
Abstract:Traditional text classification methods make a basic assumption: the training and test data are homologous, while this naive assumption may not hold in the real world. Hence, this paper studies the problem of domain-adapted news text classification, hereby a model is trained on labeled data from one source domain and is able to be deployed on the other. To realize the cross-domain text classification, we propose a domain-adapted text classification method based on topic model LDA and TextCNN model, named TextLDACNN. Specifically, our work calculates the topic similarity between source and target domain, which is severed as an effective constraint to regularize the training process and hence improve the generalization of the source model to the target domain. Text classifier trained with unsupervised topic feature representation clearly outperforms the baseline TextCNN model. The result shows that our method achieves an approximately 4.0% improvement compared to the state-of-the-art method.
What problem does this paper attempt to address?