Abstract:Pre-training deep learning models with large data sets of natural images, such as ImageNet, has become the standard for endoscopic image analysis. This approach is generally superior to training from scratch, due to the scarcity of high-quality medical imagery and labels. However, it is still unknown whether the learned features on natural imagery provide an optimal starting point for the downstream medical endoscopic imaging tasks. Intuitively, pre-training with imagery closer to the target domain could lead to better-suited feature representations. This study evaluates whether leveraging in-domain pre-training in gastrointestinal endoscopic image analysis has potential benefits compared to pre-training on natural images. To this end, we present a dataset comprising of 5,014,174 gastrointestinal endoscopic images from eight different medical centers (GastroNet-5M), and exploit self-supervised learning with SimCLRv2, MoCov2 and DINO to learn relevant features for in-domain downstream tasks. The learned features are compared to features learned on natural images derived with multiple methods, and variable amounts of data and/or labels (e.g. Billion-scale semi-weakly supervised learning and supervised learning on ImageNet-21k). The effects of the evaluation is performed on five downstream data sets, particularly designed for a variety of gastrointestinal tasks, for example, GIANA for angiodyplsia detection and Kvasir-SEG for polyp segmentation. The findings indicate that self-supervised domain-specific pre-training, specifically using the DINO framework, results into better performing models compared to any supervised pre-training on natural images. On the ResNet50 and Vision-Transformer-small architectures, utilizing self-supervised in-domain pre-training with DINO leads to an average performance boost of 1.63% and 4.62%, respectively, on the downstream datasets. This improvement is measured against the best performance achieved through pre-training on natural images within any of the evaluated frameworks. Moreover, the in-domain pre-trained models also exhibit increased robustness against distortion perturbations (noise, contrast, blur, etc.), where the in-domain pre-trained ResNet50 and Vision-Transformer-small with DINO achieved on average 1.28% and 3.55% higher on the performance metrics, compared to the best performance found for pre-trained models on natural images. Overall, this study highlights the importance of in-domain pre-training for improving the generic nature, scalability and performance of deep learning for medical image analysis. The GastroNet-5M pre-trained weights are made publicly available in our repository: huggingface.co/tgwboers/GastroNet-5M_Pretrained_Weights.

To pretrain or not to pretrain? A case study of domain-specific pretraining for semantic segmentation in histopathology

Large-scale pretraining on pathological images for fine-tuning of small pathological benchmarks

Benchmarking Self-Supervised Learning on Diverse Pathology Datasets

Toward Source-Free Cross Tissues Histopathological Cell Segmentation via Target-Specific Finetuning

Self-Supervised Pretraining for 2D Medical Image Segmentation

Nucleus-Aware Self-Supervised Pretraining Using Unpaired Image-to-Image Translation for Histopathology Images.

Mind the Gap: Scanner-induced domain shifts pose challenges for representation learning in histopathology

Rethinking Pre-Training on Medical Imaging.

Foundation models in gastrointestinal endoscopic AI: Impact of architecture, pre-training approach and data efficiency

Domain-specific optimization and diverse evaluation of self-supervised models for histopathology

Domain-specific Knowledge Guided Self-supervised Learning for Pathological Image Segmentation.

A survey of the impact of self-supervised pretraining for diagnostic tasks in medical X-ray, CT, MRI, and ultrasound

A Survey of the Impact of Self-Supervised Pretraining for Diagnostic Tasks with Radiological Images

Dual Adaptive Pyramid Network for Cross-Stain Histopathology Image Segmentation

Unsupervised Domain Adaptation for the Histopathological Cell Segmentation Through Self-Ensembling

Improved Domain Generalization for Cell Detection in Histopathology Images via Test-Time Stain Augmentation

Maximising Histopathology Segmentation using Minimal Labels via Self-Supervision

A General Global and Local Pre-Training Framework for 3D Medical Image Segmentation.

Weakly supervised semantic segmentation of histological tissue via attention accumulation and pixel-level contrast learning

A Two-stage Weakly Supervised Semantic Segmentation Model Based on Pathological Tissue Relationships

Epithelium-Stroma Classification Via Convolutional Neural Networks and Unsupervised Domain Adaptation in Histopathological Images