Abstract:Previous chapter Next chapter Full AccessProceedings Proceedings of the 2015 SIAM International Conference on Data Mining (SDM)DIAS: A Disassemble-Assemble Framework for Highly Sparse Text ClusteringHongfu Liu, Junjie Wu, Dacheng Tao, Yuchao Zhang, and Yun FuHongfu Liu, Junjie Wu, Dacheng Tao, Yuchao Zhang, and Yun Fupp.766 - 774Chapter DOI:https://doi.org/10.1137/1.9781611974010.86PDFBibTexSections ToolsAdd to favoritesExport CitationTrack CitationsEmail SectionsAboutAbstract Upon extensive studies, text clustering remains a critical challenge in data mining community. Even by various techniques proposed to overcome some of these challenges, there still exist problems when dealing with weakly related or even noisy features. In response to this, we propose a DIssemble-ASsemble (DIAS) framework for text clustering. DIAS employs simple random feature sampling to disassemble high-dimensional text data and gains diverse structural knowledge. This also does good to avoiding the bulk of noisy features. Then the multi-view knowledge is assembled by weighted Information-theoretic Consensus Clustering (ICC) in order to gain a high-quality consensus partitioning. Extensive experiments on eight real-world text data sets demonstrate the advantages of DIAS over other widely used methods. In particular, DIAS shows strengths in learning from very weak basic partitionings. In addition, it is the natural suitability to distributed computing that makes DIAS become a promising candidate for big text clustering. Previous chapter Next chapter RelatedDetails Published:2015eISBN:978-1-61197-401-0 https://doi.org/10.1137/1.9781611974010Book Series Name:ProceedingsBook Code:PRDT145Book Pages:1-976

Text Clustering Based on Asymmetric Similarity

TCUAP: A Novel Approach of Text Clustering Using Asymmetric Proximity.

An adaptive method for text domain similarity calculation

Semantic Correlation Network Based Text Clustering

DIAS: A Disassemble-Assemble Framework for Highly Sparse Text Clustering

A Novel Text Clustering Algorithm Based on Inner Product Space Model of Semantic

Efficient Phrase-Based Document Similarity for Clustering

Clustering Text Data Streams

A Novel Discrimination Structure for Assessing Text Semantic Similarity

A Clustering Algorithm for Short Documents Based On Concept Similarity

Comparison study of using semantic and syntactic network characteristics to do text clustering

Discriminative Similarity for Data Clustering

Contrastive Learning Subspace for Text Clustering

A New Suffix Tree Similarity Measure for Document Clustering

Text Clustering on Oral Conversation Corpus.

A Lda-Based Algorithm For Length-Aware Text Clustering

Constrained Coclustering for Textual Documents.

Text Similarity Measurement Method and Application of Online Medical Community Based on Density Peak Clustering

Feature Dimension Reduction Short Text Clustering Combined with Semantic and Statistics

Clustering articles based on semantic similarity

Cross-Lingual Document Clustering Based on Similarity Space Model