Abstract:The structure of many real-world datasets is intrinsically hierarchical, making the modeling of such hierarchies a critical objective in both unsupervised and supervised machine learning. Recently, novel approaches for hierarchical clustering with deep architectures have been proposed. In this work, we take a critical perspective on this line of research and demonstrate that many approaches exhibit major limitations when applied to realistic datasets, partly due to their high computational complexity. In particular, we show that a lightweight procedure implemented on top of pre-trained non-hierarchical clustering models outperforms models designed specifically for hierarchical clustering. Our proposed approach is computationally efficient and applicable to any pre-trained clustering model that outputs logits, without requiring any fine-tuning. To highlight the generality of our findings, we illustrate how our method can also be applied in a supervised setup, recovering meaningful hierarchies from a pre-trained ImageNet classifier.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the limitations of existing methods in hierarchical clustering on real - world datasets. Specifically: 1. **High computational complexity**: Existing hierarchical clustering methods, especially those based on deep learning, cannot be effectively applied to large - scale datasets due to their complex architectures and training schemes. 2. **Insufficient performance**: When dealing with large - scale datasets, these methods are usually inferior to non - hierarchical clustering models, especially in terms of performance at the leaf - node level. 3. **Poor generality**: Many existing hierarchical clustering methods require specially - designed models and complex training processes, which limit their scope of application. To address these problems, the paper proposes a lightweight method called "Logits to Hierarchies (L2H)", which can generate a hierarchical structure based on any pre - trained non - hierarchical clustering model without fine - tuning the model. This method is not only computationally efficient but also applicable to supervised and unsupervised learning scenarios. ### Main contributions of the paper 1. **Proposing a simple and efficient algorithm**: The L2H algorithm can convert the logits of a pre - trained model into a hierarchical structure, significantly outperforming specially - designed hierarchical clustering models and having low computational requirements. On a single CPU core, this algorithm can complete the hierarchical clustering of an ImageNet - sized dataset within a few minutes. 2. **Revealing the limitations of existing methods**: The experimental results show that recently proposed hierarchical clustering methods have significant limitations on large - scale datasets, especially in terms of performance at the leaf - node level, which is inferior to non - hierarchical clustering methods. 3. **Demonstrating the application in supervised learning**: By applying the L2H algorithm to a pre - trained ImageNet classifier, the paper shows how to recover a meaningful hierarchical structure from the classifier's logits, which helps to discover the potential biases of the model and the ambiguities of existing classifications. ### Method overview The core idea of the L2H algorithm is to construct a hierarchical structure by iteratively merging similar clusters. The specific steps are as follows: 1. **Initialization**: Initialize each cluster as a separate group. 2. **Calculate group scores**: For each group, calculate the aggregated value of the predicted probabilities of the data points within the group. 3. **Select the group with the lowest score**: Select the group with the lowest score for merging. 4. **Calculate the probability of re - assignment**: For the selected group, calculate the total predicted probability of data points in other groups being re - assigned to different clusters. 5. **Select the most relevant group**: Select the group that is most relevant to the current group for merging. 6. **Update the group and the hierarchical structure**: Merge the two groups and update the hierarchical structure. ### Experimental results The paper conducted experiments on three datasets, CIFAR - 10, CIFAR - 100, and Food - 101. The results show that the L2H algorithm is superior to existing deep - learning methods in both the quality of hierarchical clustering and computational efficiency. In particular, the L2H algorithm can maintain high performance on large - scale datasets without significantly increasing the computational cost as the data scale increases. ### Case study The paper also demonstrates the application of the L2H algorithm in supervised learning. By applying the L2H algorithm to a pre - trained ImageNet classifier, it successfully recovers part of the WordNet hierarchical structure and discovers some potential biases of the model, such as some birds being misclassified as parrots or game birds. In conclusion, the paper proposes an efficient and general - purpose hierarchical clustering method, which solves the limitations of existing methods on large - scale datasets and verifies its effectiveness on multiple datasets.

From Logits to Hierarchies: Hierarchical Clustering made Simple

Hierarchical information clustering by means of topologically embedded graphs

Scalable Hierarchical Clustering by Composition Rank Vector Encoding and Tree Structure

Hierarchical Clustering: Objective Functions and Algorithms

Effective hierarchical clustering based on structural similarities in nearest neighbor graphs

Learning Visual Hierarchies with Hyperbolic Embeddings

Exploring and Exploiting Hierarchical Structures for Large-Scale Classification

Objective-Based Hierarchical Clustering of Deep Embedding Vectors

Scalable Hierarchical Agglomerative Clustering

Learning Hierarchical Graph Neural Networks for Image Clustering

Hierarchical Information-Theoretic Co-Clustering For High Dimensional Data

Hierarchical Overlapping Clustering of Network Data Using Cut Metrics

A Novel Hierarchical Clustering Approach Based on Universal Gravitation

Cost-effective Hierarchical Clustering with Local Density Peak Detection

Hierarchical Block Structures and High-resolution Model Selection in Large Networks

Improved Hierarchical Clustering on Massive Datasets with Broad Guarantees

How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model

Order preserving hierarchical agglomerative clustering

Data Structures & Algorithms for Exact Inference in Hierarchical Clustering

Contrastive Hierarchical Clustering