Abstract:Background and objectives: Whole slide image (WSI) classification is of great clinical significance in computer-aided pathological diagnosis. Due to the high cost of manual annotation, weakly supervised WSI classification methods have gained more attention. As the most representative, multiple instance learning (MIL) generally aggregates the predictions or features of the patches within a WSI to achieve the slide-level classification under the weak supervision of WSI labels. However, most existing MIL methods ignore spatial position relationships of the patches, which is likely to strengthen the discriminative ability of WSI-level features. Methods: In this paper, we propose a novel positional encoding-guided transformer-based multiple instance learning (PEGTB-MIL) method for histopathology WSI classification. It aims to encode the spatial positional property of the patch into its corresponding semantic features and explore the potential correlation among the patches for improving the WSI classification performance. Concretely, the deep features of the patches in WSI are first extracted and simultaneously a position encoder is used to encode the spatial 2D positional information of the patches into the spatial-aware features. After incorporating the semantic features and spatial embeddings, multi-head self-attention (MHSA) is applied to explore the contextual and spatial dependencies of the fused features. Particularly, we introduce an auxiliary reconstruction task to enhance the spatial–semantic consistency and generalization ability of features. Results: The proposed method is evaluated on two public benchmark TCGA datasets (TCGA-LUNG and TCGA-BRCA) and two in-house clinical datasets (USTC-EGFR and USTC-GIST). Experimental results validate it is effective in the tasks of cancer subtyping and gene mutation status prediction. In the test stage, the proposed PEGTB-MIL outperforms the other state-of-the-art methods and respectively achieves 97.13±0.34%, 86.74±2.64%, 83.25±1.65%, and 72.52±1.63% of the area under the receiver operating characteristic (ROC) curve (AUC). Conclusion: PEGTB-MIL utilizes positional encoding to effectively guide and reinforce MIL, leading to enhanced performance on downstream WSI classification tasks. Specifically, the introduced auxiliary reconstruction module adeptly preserves the spatial–semantic consistency of patch features. More significantly, this study investigates the relationship between position information and disease diagnosis and presents a promising avenue for further research.

MG-Trans: Multi-Scale Graph Transformer with Information Bottleneck for Whole Slide Image Classification.

Multi-scale Efficient Graph-Transformer for Whole Slide Image Classification

Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification

TGMIL: A hybrid multi-instance learning model based on the Transformer and the Graph Attention Network for whole-slide images classification of renal cell carcinoma

Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics

Multi-Scale Prototypical Transformer for Whole Slide Image Classification

Multi-class Cancer Classification of Whole Slide Images Through Transformer and Multiple Instance Learning.

Unsupervised Mutual Transformer Learning for Multi-Gigapixel Whole Slide Image Classification

Integrative Graph-Transformer Framework for Histopathology Whole Slide Image Representation and Classification

Positional Encoding-Guided Transformer-Based Multiple Instance Learning for Histopathology Whole Slide Images Classification

Attention Multiple Instance Learning with Transformer Aggregation for Breast Cancer Whole Slide Image Classification

SETMIL: Spatial Encoding Transformer-Based Multiple Instance Learning for Pathological Image Analysis

RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification

MGCT: Mutual-Guided Cross-Modality Transformer for Survival Outcome Prediction using Integrative Histopathology-Genomic Features

Multi-scale Multi-Instance Contrastive Learning for Whole Slide Image Classification

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images

FR-MIL: Distribution Re-calibration based Multiple Instance Learning with Transformer for Whole Slide Image Classification

Trusted multi-scale classification framework for whole slide image

MulGT: Multi-task Graph-Transformer with Task-aware Knowledge Injection and Domain Knowledge-driven Pooling for Whole Slide Image Analysis