Exploiting Associations Between Word Clusters and Document Classes for Cross-domain Text Categorization
Fuzhen Zhuang,Ping Luo,Hui Xiong,Zhongzhi Shi,Qing He,Yuhong Xiong
DOI: https://doi.org/10.1137/1.9781611972801.2
2010-01-01
Abstract:Previous chapter Next chapter Full AccessProceedings Proceedings of the 2010 SIAM International Conference on Data Mining (SDM)Exploiting Associations between Word Clusters and Document Classes for Cross-domain Text CategorizationFuzhen Zhuang, Ping Luo, Hui Xiong, Zhongzhi Shi, Qing He, and Yuhong XiongFuzhen Zhuang, Ping Luo, Hui Xiong, Zhongzhi Shi, Qing He, and Yuhong Xiongpp.13 - 24Chapter DOI:https://doi.org/10.1137/1.9781611972801.2PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract Cross-domain text categorization targets on adapting the knowledge learnt from a labeled source-domain to an unlabeled target-domain, where the documents from the source and target domains are drawn from different distributions. However, in spite of the different distributions in raw word features, the associations between word clusters (conceptual features) and document classes may remain stable across different domains. In this paper, we exploit these unchanged associations as the bridge of knowledge transformation from the source domain to the target domain by the nonnegative matrix tri-factorization. Specifically, we formulate a joint optimization framework of the two matrix tri-factorizations for the source and target domain data respectively, in which the associations between word clusters and document classes are shared between them. Then, we give an iterative algorithm for this optimization and theoretically show its convergence. The comprehensive experiments show the effectiveness of this method. In particular, we show that the proposed method can deal with some difficult scenarios where baseline methods usually do not perform well. Previous chapter Next chapter RelatedDetails Published:2010ISBN:978-0-89871-703-7eISBN:978-1-61197-280-1 https://doi.org/10.1137/1.9781611972801Book Series Name:ProceedingsBook Code:PR136Book Pages:1-953Key words:Cross-domain Learning, Domain Adaption, Transfer Learning, Text Categorization