Abstract:Fine-grained classification of whole slide images (WSIs) is essential in precision oncology, enabling precise cancer diagnosis and personalized treatment strategies. The core of this task involves distinguishing subtle morphological variations within the same broad category of gigapixel-resolution images, which presents a significant challenge. While the multi-instance learning (MIL) paradigm alleviates the computational burden of WSIs, existing MIL methods often overlook hierarchical label correlations, treating fine-grained classification as a flat multi-class classification task. To overcome these limitations, we introduce a novel hierarchical multi-instance learning (HMIL) framework. By facilitating on the hierarchical alignment of inherent relationships between different hierarchy of labels at instance and bag level, our approach provides a more structured and informative learning process. Specifically, HMIL incorporates a class-wise attention mechanism that aligns hierarchical information at both the instance and bag levels. Furthermore, we introduce supervised contrastive learning to enhance the discriminative capability for fine-grained classification and a curriculum-based dynamic weighting module to adaptively balance the hierarchical feature during training. Extensive experiments on our large-scale cytology cervical cancer (CCC) dataset and two public histology datasets, BRACS and PANDA, demonstrate the state-of-the-art class-wise and overall performance of our HMIL framework. Our source code is available at <a class="link-external link-https" href="https://github.com/ChengJin-git/HMIL" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in the fine - grained classification of whole - slide images (WSIs), existing methods fail to fully utilize the label hierarchy information. Specifically:
1. **Fine - grained Classification Challenges**: In precision oncology, the fine - grained classification of WSIs is crucial for accurate cancer diagnosis and personalized treatment strategies. This requires distinguishing subtle morphological changes within the same broad category, and these images are usually at gigapixel - level resolution, making this task extremely challenging.
2. **Limitations of Multi - Instance Learning (MIL)**: Although MIL methods reduce the computational burden of WSIs, existing MIL methods usually ignore the correlations between hierarchical labels, treating fine - grained classification as a flat multi - class classification task and being unable to effectively utilize the hierarchical structure information of labels.
To solve these problems, the authors propose a novel hierarchical multi - instance learning framework (Hierarchical Multi - Instance Learning, HMIL). HMIL provides a more structured and informative learning process by hierarchically aligning the inherent relationships between different - level labels at the instance and bag levels. Specific contributions include:
- **Introducing a Hierarchical Attention Mechanism**: Introducing category - level attention mechanisms at the instance and bag levels to align hierarchical information.
- **Supervised Contrastive Learning**: Enhancing the discriminative ability of fine - grained classification.
- **Curriculum - based Dynamic Weighting Module**: Adaptively balancing the influence of hierarchical features during the training process.
Through these improvements, the HMIL framework can more effectively handle the fine - grained classification problem of WSIs and improve the model's ability to distinguish subtle cancerous subtypes.
### Formula Summary
1. **Cross - Entropy Loss Function**:
\[
L_{ce}^{(c,f)}=-\sum_{i = 1}^{K_{c,f}}Y_i\log(\hat{Y}_i)
\]
where \(Y\) is the true label, \(\hat{Y}\) is the predicted probability distribution, and \(K_{c,f}\) is the number of classes.
2. **Instance - level Hierarchical Alignment Loss**:
\[
L_{ia}=\frac{1}{N_i}(1 - \cos(A_{i,c},MA_{i,f}))
\]
where \(\cos\) represents cosine similarity, and \(M\) is a mapping matrix that converts fine - grained attention scores to a coarse - grained hierarchy.
3. **Bag - level Hierarchical Alignment Loss**:
\[
L_{ba}=-\sum_{i = 1}^{K_c}Y_i^{(c)}\log(\tilde{Y}_i^{(c)})
\]
where \(Y_i^{(c)}\) is the true label of the coarse - grained category, and \(\tilde{Y}^{(c)} = Mp_f\) is the predicted coarse - grained probability derived from the fine - grained probability through the mapping matrix.
4. **Supervised Contrastive Loss**:
\[
L_{reg}=\sum_{i = 1}^b\frac{- 1}{|P_i|}\sum_{B_{p,f}\in P_i}\log\frac{\exp(B_{i,f}\cdot B_{p,f}^\top/\tau)}{\sum_{B_{o,f}\in V_i}\exp(B_{i,f}\cdot B_{o,f}^\top/\tau)}
\]
where \(V_i\) is the set of features in the current batch except \(B_{i,f}\), \(P_i\) is the set of features with the same fine - grained label, and the temperature hyperparameter \(\tau\) is set to 0.1.
5. **Total Loss Function**:
\[
L=\beta\cdot(L_{ce}^{(c)}+L_{ia}+L_{ba})+(1 -