Abstract:Background and objectives: Whole slide image (WSI) classification is of great clinical significance in computer-aided pathological diagnosis. Due to the high cost of manual annotation, weakly supervised WSI classification methods have gained more attention. As the most representative, multiple instance learning (MIL) generally aggregates the predictions or features of the patches within a WSI to achieve the slide-level classification under the weak supervision of WSI labels. However, most existing MIL methods ignore spatial position relationships of the patches, which is likely to strengthen the discriminative ability of WSI-level features. Methods: In this paper, we propose a novel positional encoding-guided transformer-based multiple instance learning (PEGTB-MIL) method for histopathology WSI classification. It aims to encode the spatial positional property of the patch into its corresponding semantic features and explore the potential correlation among the patches for improving the WSI classification performance. Concretely, the deep features of the patches in WSI are first extracted and simultaneously a position encoder is used to encode the spatial 2D positional information of the patches into the spatial-aware features. After incorporating the semantic features and spatial embeddings, multi-head self-attention (MHSA) is applied to explore the contextual and spatial dependencies of the fused features. Particularly, we introduce an auxiliary reconstruction task to enhance the spatial–semantic consistency and generalization ability of features. Results: The proposed method is evaluated on two public benchmark TCGA datasets (TCGA-LUNG and TCGA-BRCA) and two in-house clinical datasets (USTC-EGFR and USTC-GIST). Experimental results validate it is effective in the tasks of cancer subtyping and gene mutation status prediction. In the test stage, the proposed PEGTB-MIL outperforms the other state-of-the-art methods and respectively achieves 97.13±0.34%, 86.74±2.64%, 83.25±1.65%, and 72.52±1.63% of the area under the receiver operating characteristic (ROC) curve (AUC). Conclusion: PEGTB-MIL utilizes positional encoding to effectively guide and reinforce MIL, leading to enhanced performance on downstream WSI classification tasks. Specifically, the introduced auxiliary reconstruction module adeptly preserves the spatial–semantic consistency of patch features. More significantly, this study investigates the relationship between position information and disease diagnosis and presents a promising avenue for further research.

Position-Aware Masked Autoencoder for Histopathology WSI Representation Learning

Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

Positional Encoding-Guided Transformer-Based Multiple Instance Learning for Histopathology Whole Slide Images Classification

Prompt-Guided Adaptive Model Transformation for Whole Slide Image Classification

Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis

Unleash the Power of State Space Model for Whole Slide Image with Local Aware Scanning and Importance Resampling

Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics

Multi-modal Masked Autoencoders Learn Compositional Histopathological Representations

RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification

TDT-MIL: a framework with a dual-channel spatial positional encoder for weakly-supervised whole slide image classification

Dual-Attention Multiple Instance Learning Framework for Pathology Whole-Slide Image Classification

SETMIL: Spatial Encoding Transformer-Based Multiple Instance Learning for Pathological Image Analysis

Multimodal Co-Attention Transformer for Survival Prediction in Gigapixel Whole Slide Images

Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification

TPMIL: Trainable Prototype Enhanced Multiple Instance Learning for Whole Slide Image Classification

Masked Autoencoders with Handcrafted Feature Predictions: Transformer for Weakly Supervised Esophageal Cancer Classification.

Self-Supervised Representation Distribution Learning for Reliable Data Augmentation in Histopathology WSI Classification

Gigapixel Whole-Slide Images Classification using Locally Supervised Learning

Attention Multiple Instance Learning with Transformer Aggregation for Breast Cancer Whole Slide Image Classification

Unsupervised Mutual Transformer Learning for Multi-Gigapixel Whole Slide Image Classification

Multi-scale representation attention based deep multiple instance learning for gigapixel whole slide image analysis