Abstract:The use of deep learning algorithms to extract meaningful diagnostic features from biomedical images holds the promise to improve patient care given the expansion of digital pathology. Among these deep learning models, Vision Transformer (ViT) models have been demonstrated to capture long-range spatial relationships with more robust prediction power for image classification tasks than regular convolutional neural network (CNN) models, and also better model interpretability. Model interpretation is important for understanding and elucidating how a deep learning model makes predictions, especially for developing transparent models for digital pathology. However, like other deep learning algorithms, with limited annotated biomedical imaging datasets, ViT models are prone to poor performance due to overfitting, which can lead to false predictions due to random noise. Overfitting affects model interpretation when predictions are made out of random noise. To address this issue, we introduce a novel metric – Training Attention and Validation Attention Consistency (TAVAC) – for evaluating ViT model degree of overfitting on imaging datasets and quantifying the reproducibility of interpretation. Specifically, the model interpretation is performed by comparing the high-attention regions in the image between training and testing. We test the method on four publicly available image classification datasets and two independent breast cancer histological image datasets. All overfitted models exhibited significantly lower TAVAC scores than the good-fit models. The TAVAC score quantitatively measures the level of generalization of model interpretation on a fine-grained level for small groups of cells in each H&E image, which cannot be provided by traditional performance evaluation metrics like prediction accuracy. Furthermore, the application of TAVAC extends beyond medical diagnostic AI models; it enhances the monitoring of model interpretative reproducibility at pixel-resolution in basic research, to reveal critical spatial patterns and cellular structures essential to understanding biological processes and disease mechanisms. TAVAC sets a new standard for evaluating the performance of deep learning model interpretation and provides a method for determining the significance of high-attention regions detected from the attention map of the biomedical images.

ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy

Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology

Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology

Two-stage Generative Models of Simulating Training Data at the Voxel Level for Large-Scale Microscopy Bioimage Segmentation

Scaling Dense Representations for Single Cell with Transcriptome-Scale Context

CellViT: Vision Transformers for precise cell segmentation and classification

Inferring single-cell spatial gene expression with tissue morphology via explainable deep learning

SubCell: Vision foundation models for microscopy capture single-cell biology

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design

Computational Pathology at Health System Scale -- Self-Supervised Foundation Models from Three Billion Images

Vision Transformers for Weakly-Supervised Microorganism Enumeration

Phikon-v2, A large and public feature extractor for biomarker prediction

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

How Good Are We? Evaluating Cell AI Foundation Models in Kidney Pathology with Human-in-the-Loop Enrichment

CellMemory: Hierarchical Interpretation of Out-of-Distribution Cells Using Bottlenecked Transformer

ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

Large-scale foundation model on single-cell transcriptomics

Quantifying Interpretation Reproducibility in Vision Transformer Models with TAVAC

DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology