Deep Online Probability Aggregation Clustering

Yuxuan Yan,Na Lu,Ruofan Yan
2024-07-13
Abstract:Combining machine clustering with deep models has shown remarkable superiority in deep clustering. It modifies the data processing pipeline into two alternating phases: feature clustering and model training. However, such alternating schedule may lead to instability and computational burden issues. We propose a centerless clustering algorithm called Probability Aggregation Clustering (PAC) to proactively adapt deep learning technologies, enabling easy deployment in online deep clustering. PAC circumvents the cluster center and aligns the probability space and distribution space by formulating clustering as an optimization problem with a novel objective function. Based on the computation mechanism of the PAC, we propose a general online probability aggregation module to perform stable and flexible feature clustering over mini-batch data and further construct a deep visual clustering framework deep PAC (DPAC). Extensive experiments demonstrate that PAC has superior clustering robustness and performance and DPAC remarkably outperforms the state-of-the-art deep clustering methods.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper primarily aims to address the issues of instability and computational burden in deep clustering, especially in online deep clustering scenarios. Specifically, the combination of machine learning clustering algorithms and deep learning models has shown significant advantages in the field of deep clustering. However, the process of alternately optimizing feature clustering and model training can lead to instability and increased computational costs. To solve these problems, the authors propose a new method called Probabilistic Aggregation Clustering (PAC). PAC is a center-independent clustering algorithm that aligns the probabilistic space and distribution space by modeling the clustering problem as an optimization problem, introducing a novel objective function to quantify intra-cluster distances. Based on the computational mechanism of PAC, the authors further propose a general Online Probabilistic Aggregation module (OPA) for performing stable and flexible mini-batch data feature clustering, and they construct a deep visual clustering framework called Deep PAC (DPAC). DPAC aims to overcome two major challenges in existing deep clustering methods: batch clustering and contrastive clustering. DPAC achieves efficient online clustering through the OPA module without the need for complex hyperparameter settings or additional components. Moreover, the OPA module does not impose any constraints on cluster size, allowing for more flexible partitioning. The main contributions of the paper include: - Proposing a new center-independent partition clustering method, PAC, which achieves clustering by exploring the latent relationship between sample distribution and assignment probability. - Developing an online deep clustering module, OPA, based on PAC, which encodes spatial distances into online clustering without introducing a large number of hyperparameters and components. OPA abandons the restriction on cluster size, allowing for more flexible partitioning. - Constructing a simple end-to-end unsupervised deep clustering framework, DPAC, which ensures stable clustering, global clustering capability, and superior performance. DPAC achieves significant performance compared to state-of-the-art methods on five challenging image benchmark datasets. In summary, the paper proposes innovative solutions to key challenges in deep clustering and validates the effectiveness of the proposed methods through experiments.