Abstract:Computational pathology can lead to saving human lives, but models are annotation hungry and pathology images are notoriously expensive to annotate. Self-supervised learning has shown to be an effective method for utilizing unlabeled data, and its application to pathology could greatly benefit its downstream tasks. Yet, there are no principled studies that compare SSL methods and discuss how to adapt them for pathology. To address this need, we execute the largest-scale study of SSL pre-training on pathology image data, to date. Our study is conducted using 4 representative SSL methods on diverse downstream tasks. We establish that large-scale domain-aligned pre-training in pathology consistently out-performs ImageNet pre-training in standard SSL settings such as linear and fine-tuning evaluations, as well as in low-label regimes. Moreover, we propose a set of domain-specific techniques that we experimentally show leads to a performance boost. Lastly, for the first time, we apply SSL to the challenging task of nuclei instance segmentation and show large and consistent performance improvements under diverse settings.

What problem does this paper attempt to address?

The main problems that this paper attempts to solve are: How to conduct large - scale pre - training on pathological image data through self - supervised learning (SSL) to improve the performance of downstream pathological tasks, especially in the case of scarce labeled data. Specifically, the paper focuses on the following aspects: 1. **Improving the performance of pathological tasks**: By using large - scale unlabeled pathological image data for self - supervised pre - training, the paper aims to explore whether this method can improve the performance of downstream tasks such as pathological image classification and nuclear instance segmentation more effectively than using ImageNet pre - training weights. 2. **Adapting to the characteristics of pathological images**: Since pathological images are significantly different from natural images (e.g., no standard orientation, few color changes, different interpretations in different fields of view), the paper proposes a series of data augmentation techniques specific to pathological images to better adapt to these characteristics and thus improve the effect of self - supervised learning. 3. **Label efficiency**: In the field of pathology, obtaining high - quality labeled data is both expensive and time - consuming. Therefore, the paper evaluates the performance of self - supervised pre - training models when fine - tuning with a small amount of labeled data, and verifies the advantages of these models in terms of label efficiency. 4. **Multi - field - of - view processing**: Pathological tasks may require different fields of view (FoV), for example, high resolution for cell classification and low resolution for tissue structure analysis. The paper explores how to enable pre - training models to handle tasks with different fields of view. Through these studies, the paper hopes to provide a comprehensive benchmark test for self - supervised learning in the field of pathology and propose effective technical solutions to promote the development of this field.

Benchmarking Self-Supervised Learning on Diverse Pathology Datasets

Adapting Self-Supervised Learning for Computational Pathology

Domain-specific optimization and diverse evaluation of self-supervised models for histopathology

Generalizability of Self-Supervised Training Models for Digital Pathology: A Multicountry Comparison in Colorectal Cancer

Dive into the Details of Self-Supervised Learning for Medical Image Analysis.

Dive into Self-Supervised Learning for Medical Image Analysis: Data, Models and Tasks

Evaluating Self-Supervised Learning in Medical Imaging: A Benchmark for Robustness, Generalizability, and Multi-Domain Impact

A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models

A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification

Computational Pathology at Health System Scale -- Self-Supervised Foundation Models from Three Billion Images

Self-supervised learning for skin cancer diagnosis with limited training data

Exploring Self-Supervised Representation Learning For Low-Resource Medical Image Analysis

Improving Self-supervised Learning with Hardness-aware Dynamic Curriculum Learning: An Application to Digital Pathology

SSLP: Spatial Guided Self-supervised Learning on Pathological Images

Domain-specific Knowledge Guided Self-supervised Learning for Pathological Image Segmentation.

Stain-Adaptive Self-Supervised Learning for Histopathology Image Analysis

Benchmarking Self-Supervised Learning for Single-Cell Data

Generative and Contrastive Based Self-Supervised Learning Model for Histopathology Image Analysis.

SSL-CPCD: Self-supervised learning with composite pretext-class discrimination for improved generalisability in endoscopic image analysis

Transformer-based unsupervised contrastive learning for histopathological image classification

GenSelfDiff-HIS: Generative Self-Supervision Using Diffusion for Histopathological Image Segmentation