Abstract:NLP-based computer vision models, particularly vision transformers, have been shown to outperform CNN models in many imaging tasks. However, most digital pathology artificial-intelligence models are based on CNN architectures, probably owing to a lack of data regarding NLP models for pathology images. In this study, we developed digital pathology pipelines to benchmark the five most recently proposed NLP models (vision transformer (ViT), Swin Transformer, MobileViT, CMT, and Sequencer2D) and four popular CNN models (ResNet18, ResNet50, MobileNetV2, and EfficientNet) to predict biomarkers in colorectal cancer (microsatellite instability, CpG island methylator phenotype, and BRAF mutation). Hematoxylin and eosin-stained whole-slide images from Molecular and Cellular Oncology and The Cancer Genome Atlas were used as training and external validation datasets, respectively. Cross-study external validations revealed that the NLP-based models significantly outperformed the CNN-based models in biomarker prediction tasks, improving the overall prediction and precision up to approximately 10% and 26%, respectively. Notably, compared with existing models in the current literature using large training datasets, our NLP models achieved state-of-the-art predictions for all three biomarkers using a relatively small training dataset, suggesting that large training datasets are not a prerequisite for NLP models or transformers, and NLP may be more suitable for clinical studies in which small training datasets are commonly collected. The superior performance of Sequencer2D suggests that further research and innovation on both transformer and bidirectional long short-term memory architectures are warranted in the field of digital pathology. NLP models can replace classic CNN architectures and become the new workhorse backbone in the field of digital pathology.

The Importance of Downstream Networks in Digital Pathology Foundation Models

Benchmarking Pathology Feature Extractors for Whole Slide Image Classification

Benchmarking foundation models as feature extractors for weakly-supervised computational pathology

Benchmarking Embedding Aggregation Methods in Computational Pathology: A Clinical Data Perspective

Do Histopathological Foundation Models Eliminate Batch Effects? A Comparative Study

Foundation Models for Slide-level Cancer Subtyping in Digital Pathology

Histopathology image embedding based on foundation models features aggregation for patient treatment response prediction

Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts

Model-Agnostic Binary Patch Grouping for Bone Marrow Whole Slide Image Representation

A whole-slide foundation model for digital pathology from real-world data

Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation

A Clinical Benchmark of Public Self-Supervised Pathology Foundation Models

PathoDuet: Foundation Models for Pathological Slide Analysis of H&E and IHC Stains

Unlocking the Potential of Digital Pathology: Novel Baselines for Compression

PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology

Low-resource finetuning of foundation models beats state-of-the-art in histopathology

Evolutionary Computation in Action: Feature Selection for Deep Embedding Spaces of Gigapixel Pathology Images

Comparing ImageNet Pre-training with Digital Pathology Foundation Models for Whole Slide Image-Based Survival Analysis

Time to Embrace Natural Language Processing (NLP)-based Digital Pathology: Benchmarking NLP- and Convolutional Neural Network-based Deep Learning Pipelines

How Good Are We? Evaluating Cell AI Foundation Models in Kidney Pathology with Human-in-the-Loop Enrichment

Topological Feature Extraction and Visualization of Whole Slide Images using Graph Neural Networks