DIAS: A Disassemble-Assemble Framework for Highly Sparse Text Clustering
Hongfu Liu,Junjie Wu,Dacheng Tao,Yuchao Zhang,Yun Fu
DOI: https://doi.org/10.1137/1.9781611974010.86
2015-01-01
Abstract:Previous chapter Next chapter Full AccessProceedings Proceedings of the 2015 SIAM International Conference on Data Mining (SDM)DIAS: A Disassemble-Assemble Framework for Highly Sparse Text ClusteringHongfu Liu, Junjie Wu, Dacheng Tao, Yuchao Zhang, and Yun FuHongfu Liu, Junjie Wu, Dacheng Tao, Yuchao Zhang, and Yun Fupp.766 - 774Chapter DOI:https://doi.org/10.1137/1.9781611974010.86PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract Upon extensive studies, text clustering remains a critical challenge in data mining community. Even by various techniques proposed to overcome some of these challenges, there still exist problems when dealing with weakly related or even noisy features. In response to this, we propose a DIssemble-ASsemble (DIAS) framework for text clustering. DIAS employs simple random feature sampling to disassemble high-dimensional text data and gains diverse structural knowledge. This also does good to avoiding the bulk of noisy features. Then the multi-view knowledge is assembled by weighted Information-theoretic Consensus Clustering (ICC) in order to gain a high-quality consensus partitioning. Extensive experiments on eight real-world text data sets demonstrate the advantages of DIAS over other widely used methods. In particular, DIAS shows strengths in learning from very weak basic partitionings. In addition, it is the natural suitability to distributed computing that makes DIAS become a promising candidate for big text clustering. Previous chapter Next chapter RelatedDetails Published:2015eISBN:978-1-61197-401-0 https://doi.org/10.1137/1.9781611974010Book Series Name:ProceedingsBook Code:PRDT145Book Pages:1-976