Abstract:Whole Slide Image (WSI) classification with multiple instance learning (MIL) in digital pathology faces significant computational challenges. Current methods mostly rely on extensive self-supervised learning (SSL) for satisfactory performance, requiring long training periods and considerable computational resources. At the same time, no pre-training affects performance due to domain shifts from natural images to WSIs. We introduce Snuffy architecture, a novel MIL-pooling method based on sparse transformers that mitigates performance loss with limited pre-training and enables continual few-shot pre-training as a competitive option. Our sparsity pattern is tailored for pathology and is theoretically proven to be a universal approximator with the tightest probabilistic sharp bound on the number of layers for sparse transformers, to date. We demonstrate Snuffy's effectiveness on CAMELYON16 and TCGA Lung cancer datasets, achieving superior WSI and patch-level accuracies. The code is available on <a class="link-external link-https" href="https://github.com/jafarinia/snuffy" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is: in digital pathology, the significant computational challenges faced when classifying whole - slide images (WSIs) using multiple - instance learning (MIL). Specifically: 1. **High demand for computational resources**: Most current methods rely on self - supervised learning (SSL), which requires long - term training and a large amount of computational resources. 2. **Performance degradation**: Lack of pre - training or insufficient pre - training will lead to performance degradation because there are domain differences between natural image datasets (such as ImageNet - 1K) and WSIs. To solve these problems, the authors propose the Snuffy architecture, a new MIL pooling method based on sparse transformers. The main features of the Snuffy architecture include: - **Reducing computational requirements**: Through sparse transformers and continual few - shot self - supervised pre - training, the computational resources required for training embeddings are greatly reduced. - **Enhancing expressive ability**: A new bio - driven sparse pattern is introduced to ensure its ability as a universal approximator, and it is theoretically proven to have the strictest probability bounds. - **Supporting continual few - shot pre - training**: Making continual few - shot pre - training a viable and competitive option, balancing efficiency and performance. Specifically, the Snuffy architecture has demonstrated excellent WSI and patch - level accuracy on the CAMELYON16 and TCGA lung cancer datasets, and has reached a new state - of - the - art level in multiple - instance learning (MIL) tasks. ### Summary of main contributions: 1. **Continual self - supervised pre - training**: Continual SSL pre - training from the ImageNet - 1K pre - training model to the pathological dataset, using adapters to significantly reduce the pre - training computation time. 2. **New bio - driven sparse pattern**: A new strictly bounded probability is introduced to ensure its ability as a universal approximator. 3. **Significantly improving WSI classification metrics**: Achieving new state - of - the - art results in WSI classification (AUC 0.987) and ROI detection (FROC 0.675). 4. **Extensive verification**: Verified on multiple recognized benchmark datasets, demonstrating its consistent and superior performance. These improvements make the Snuffy architecture not only perform well in WSI classification tasks, but also have great potential in clinical applications.

Snuffy: Efficient Whole Slide Image Classifier

RoFormer for Position Aware Multiple Instance Learning in Whole Slide Image Classification

The Whole Pathological Slide Classification via Weakly Supervised Learning

Unsupervised Mutual Transformer Learning for Multi-Gigapixel Whole Slide Image Classification

Iterative multiple instance learning for weakly annotated whole slide image classification

RetMIL: Retentive Multiple Instance Learning for Histopathological Whole Slide Image Classification

FR-MIL: Distribution Re-calibration based Multiple Instance Learning with Transformer for Whole Slide Image Classification

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

An Efficient Cervical Whole Slide Image Analysis Framework Based on Multi-scale Semantic and Location Deep Features

TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification

DTFD-MIL: Double-Tier Feature Distillation Multiple Instance Learning for Histopathology Whole Slide Image Classification

Learning Binary and Sparse Permutation-Invariant Representations for Fast and Memory Efficient Whole Slide Image Search

A universal multiple instance learning framework for whole slide image analysis

FourierMIL: Fourier filtering-based multiple instance learning for whole slide image analysis

Semantic-Similarity Collaborative Knowledge Distillation Framework for Whole Slide Image Classification

Gigapixel Whole-Slide Images Classification using Locally Supervised Learning

Multi-Cohort Framework with Cohort-Aware Attention and Adversarial Mutual-Information Minimization for Whole Slide Image Classification

Whole Slide Images based Cancer Survival Prediction using Attention Guided Deep Multiple Instance Learning Networks

An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

Multi-level Multiple Instance Learning with Transformer for Whole Slide Image Classification