Integrating Vision-Language Semantic Graphs in Multi-View Clustering

JunLong Ke,Zichen Wen,Yechenhao Yang,Chenhang Cui,Yazhou Ren,Xiaorong Pu,Lifang He
DOI: https://doi.org/10.24963/ijcai.2024/472
2024-01-01
Abstract:In recent years, a variety of graph learning-based multi-view clustering (MVC) methods have emerged. However, these methods continue to face challenges in extracting latent features from real-world data, particularly in scenarios involving high-resolution color images and high-dimensional features. This task is notably difficult in cases where images are visually similar yet semantically diverse. To address this issue, we present a novel large-scale pre-trained model for multi-view clustering, named Integrate Vision-Language Semantic Graphs in Multi-View Clustering (IVSGMV), which harnesses the capabilities of visual-language pre-training models to enhance clustering performance and confronts issues in the unsupervised tuning of pre-trained models for multi-view data. We introduce an effective unsupervised approach for creating semantic graphs from image multi-view datasets using pre-trained encoders. Our method addresses the inherent spatial noise and imbalance in these encoders by employing graph filters and a joint process that integrates both image node and edge features. Additionally, we demonstrate the application of our approach to multi-view image clustering on extensive datasets, notably the high-resolution MVImgNet, achieving an impressive 82% accuracy. Furthermore, our method extends the zero-shot capabilities of large-scale pre-trained models, resulting in good performance in clustering tasks on untrained multi-view datasets.
What problem does this paper attempt to address?