Abstract:Background and objectives: Whole slide image (WSI) classification is of great clinical significance in computer-aided pathological diagnosis. Due to the high cost of manual annotation, weakly supervised WSI classification methods have gained more attention. As the most representative, multiple instance learning (MIL) generally aggregates the predictions or features of the patches within a WSI to achieve the slide-level classification under the weak supervision of WSI labels. However, most existing MIL methods ignore spatial position relationships of the patches, which is likely to strengthen the discriminative ability of WSI-level features. Methods: In this paper, we propose a novel positional encoding-guided transformer-based multiple instance learning (PEGTB-MIL) method for histopathology WSI classification. It aims to encode the spatial positional property of the patch into its corresponding semantic features and explore the potential correlation among the patches for improving the WSI classification performance. Concretely, the deep features of the patches in WSI are first extracted and simultaneously a position encoder is used to encode the spatial 2D positional information of the patches into the spatial-aware features. After incorporating the semantic features and spatial embeddings, multi-head self-attention (MHSA) is applied to explore the contextual and spatial dependencies of the fused features. Particularly, we introduce an auxiliary reconstruction task to enhance the spatial–semantic consistency and generalization ability of features. Results: The proposed method is evaluated on two public benchmark TCGA datasets (TCGA-LUNG and TCGA-BRCA) and two in-house clinical datasets (USTC-EGFR and USTC-GIST). Experimental results validate it is effective in the tasks of cancer subtyping and gene mutation status prediction. In the test stage, the proposed PEGTB-MIL outperforms the other state-of-the-art methods and respectively achieves 97.13±0.34%, 86.74±2.64%, 83.25±1.65%, and 72.52±1.63% of the area under the receiver operating characteristic (ROC) curve (AUC). Conclusion: PEGTB-MIL utilizes positional encoding to effectively guide and reinforce MIL, leading to enhanced performance on downstream WSI classification tasks. Specifically, the introduced auxiliary reconstruction module adeptly preserves the spatial–semantic consistency of patch features. More significantly, this study investigates the relationship between position information and disease diagnosis and presents a promising avenue for further research.

Multi-class Cancer Classification of Whole Slide Images Through Transformer and Multiple Instance Learning.

Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification

Unsupervised mutual transformer learning for multi-gigapixel whole slide image classification

Attention Multiple Instance Learning with Transformer Aggregation for Breast Cancer Whole Slide Image Classification

Transformer based multiple instance learning for WSI breast cancer classification

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification

Multi-Scale Prototypical Transformer for Whole Slide Image Classification

Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics

Clustering-Based Multi-instance Learning Network for Whole Slide Image Classification

Transformer-Based Video-Structure Multi-Instance Learning for Whole Slide Image Classification

Histopathological Image Classification based on Self-Supervised Vision Transformer and Weak Labels

MG-Trans: Multi-Scale Graph Transformer with Information Bottleneck for Whole Slide Image Classification.

Self-supervised Comparative Learning Based Improved Multiple Instance Learning for Whole Slide Image Classification

RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification

Positional Encoding-Guided Transformer-Based Multiple Instance Learning for Histopathology Whole Slide Images Classification

FR-MIL: Distribution Re-calibration based Multiple Instance Learning with Transformer for Whole Slide Image Classification

Multi-scale Multi-Instance Contrastive Learning for Whole Slide Image Classification

Multi-scale representation attention based deep multiple instance learning for gigapixel whole slide image analysis

Weakly Supervised Breast Cancer Classification on WSI Using Transformer and Graph Attention Network

Advances in Multiple Instance Learning for Whole Slide Image Analysis: Techniques, Challenges, and Future Directions