A Survey and an Empirical Evaluation of Multi-view Clustering Approaches

Lihua Zhou,Guowang Du,Kevin Lü,Lizheng Wang,Jingwei Du
DOI: https://doi.org/10.1145/3645108
IF: 16.6
2024-02-08
ACM Computing Surveys
Abstract:Multi-view clustering (MVC) holds a significant role in domains like machine learning, data mining, and pattern recognition. Despite the development of numerous new MVC approaches employing various techniques, there remains a gap in comprehensive studies evaluating the characteristics and performance of these approaches. This gap hinders the in-depth understanding and rational utilization of the recently developed MVC techniques. This study formalizes the basic concepts of MVC and analyzes their techniques. It then introduces a novel taxonomy for MVC approaches and presents the working mechanisms and characteristics of representative MVC approaches developed in recent years. Moreover, it summarizes representative datasets and performance metrics commonly employed for evaluating MVC approaches. Furthermore, we have meticulously chosen thirty-five representative MVC approaches to conduct an empirical evaluation across seven real-world benchmark datasets, offering valuable insights into the realm of MVC approaches.
computer science, theory & methods
What problem does this paper attempt to address?
The main goal of this paper is to address several key issues in the field of Multi-View Clustering (MVC) and provide a comprehensive review and empirical evaluation. Specifically: 1. **Concepts and Technical Formalization**: The paper first conducts a formal analysis of the basic concepts and techniques of MVC, including multi-view data, MVC problem definition, related principles, information fusion strategies, weight allocation strategies, and the clustering process. 2. **Proposing a New Classification System**: The authors propose a new classification system for MVC methods, dividing them into four categories: complete MVC, incomplete MVC, uncertain MVC, and dynamic MVC methods, and further subdividing the subcategories under each category. 3. **Summary of Representative Methods and Their Characteristics**: The paper provides a detailed introduction to the working mechanisms and characteristics of representative MVC methods developed in recent years and summarizes commonly used multi-view datasets and performance metrics. 4. **Empirical Evaluation**: 35 representative MVC methods were selected for empirical evaluation on 7 real-world benchmark datasets. The experimental results show that most MVC methods perform poorly on large-scale datasets, and no method can consistently maintain high performance across all types of datasets. Additionally, the study found that factors such as model structure, regularization constraints, and the weights of different views are crucial for improving clustering performance. 5. **Filling Existing Research Gaps**: The paper pays special attention to the latest methods proposed after 2019, especially those based on deep learning, which were previously overlooked in earlier studies. In summary, this paper aims to provide researchers and practitioners with a comprehensive understanding framework by systematically reviewing and evaluating existing MVC methods and guiding them in selecting methods suitable for specific applications.