Text-guided Foundation Model Adaptation for Long-Tailed Medical Image Classification

Sirui Li,Li Lin,Yijin Huang,Pujin Cheng,Xiaoying Tang

2024-08-27

Abstract:In medical contexts, the imbalanced data distribution in long-tailed datasets, due to scarce labels for rare diseases, greatly impairs the diagnostic accuracy of deep learning models. Recent multimodal text-image supervised foundation models offer new solutions to data scarcity through effective representation learning. However, their limited medical-specific pretraining hinders their performance in medical image classification relative to natural images. To address this issue, we propose a novel Text-guided Foundation model Adaptation for Long-Tailed medical image classification (TFA-LT). We adopt a two-stage training strategy, integrating representations from the foundation model using just two linear adapters and a single ensembler for balanced outcomes. Experimental results on two long-tailed medical image datasets validate the simplicity, lightweight and efficiency of our approach: requiring only 6.1% GPU memory usage of the current best-performing algorithm, our method achieves an accuracy improvement of up to 27.1%, highlighting the substantial potential of foundation model adaptation in this area.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper aims to address the issue of data imbalance in long-tailed medical image classification. Specifically, due to the scarcity of labels for rare diseases, actual medical datasets often exhibit a long-tailed distribution, causing deep learning models to be biased towards common categories, thereby affecting the diagnostic accuracy of critical rare conditions. To tackle this challenge, the paper proposes a novel framework—Text-guided Foundation model Adaptation for Long-Tailed medical image classification (TFA-LT). This method employs a two-stage training strategy, leveraging the representation learning capabilities of foundation models and combining richer associative representations in the text space to enhance the performance of long-tailed medical image classification. Experimental results demonstrate that this method achieves significant accuracy improvements on two long-tailed medical image datasets and exhibits extremely high computational efficiency, requiring only 6.1% of the GPU memory usage of the current best algorithm. This indicates the great potential of foundation model adaptation in handling long-tailed tasks.

Text-guided Foundation Model Adaptation for Long-Tailed Medical Image Classification

MedFMC: A Real-world Dataset and Benchmark For Foundation Model Adaptation in Medical Image Classification

TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification

Foundation models matter: federated learning for multi-center tuberculosis diagnosis via adaptive regularization and model-contrastive learning

Foundation AI Model for Medical Image Segmentation

Data Adaptive Few-shot Multi Label Segmentation with Foundation Model

Towards Foundation Models and Few-Shot Parameter-Efficient Fine-Tuning for Volumetric Organ Segmentation

Building Universal Foundation Models for Medical Image Analysis with Spatially Adaptive Networks

Repurposing Foundation Model for Generalizable Medical Time Series Classification

Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains

Text-guided Fourier Augmentation for long-tailed recognition

Text-guided Foundation Model Adaptation for Pathological Image Classification

Curriculum Fine-tuning of Vision Foundation Model for Medical Image Classification Under Label Noise

Customizing General-Purpose Foundation Models for Medical Report Generation

Less Could Be Better: Parameter-efficient Fine-tuning Advances Medical Vision Foundation Models

TG-LMM: Enhancing Medical Image Segmentation Accuracy through Text-Guided Large Multi-Modal Model

Parameter-Efficient Fine-Tuning Medical Multimodal Large Language Models for Medical Visual Grounding

Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature Enhancement

Federated Edge Learning for Medical Image Augmentation

Towards General Purpose Medical AI: Continual Learning Medical Foundation Model

EFCM: Efficient Fine-tuning on Compressed Models for deployment of large models in medical image analysis