Abstract:As advances in large language models (LLMs) and multimodal techniques continue to mature, the development of general-purpose multimodal large language models (MLLMs) has surged, offering significant applications in interpreting natural images. However, the field of pathology has largely remained untapped, particularly in gathering high-quality data and designing comprehensive model frameworks. To bridge the gap in pathology MLLMs, we present PathAsst, a multimodal generative foundation AI assistant to revolutionize diagnostic and predictive analytics in pathology. The development of PathAsst involves three pivotal steps: data acquisition, CLIP model adaptation, and the training of PathAsst's multimodal generative capabilities. Firstly, we collect over 207K high-quality pathology image-text pairs from authoritative sources. Leveraging the advanced power of ChatGPT, we generate over 180K instruction-following samples. Furthermore, we devise additional instruction-following data specifically tailored for invoking eight pathology-specific sub-models we prepared, allowing the PathAsst to effectively collaborate with these models, enhancing its diagnostic ability. Secondly, by leveraging the collected data, we construct PathCLIP, a pathology-dedicated CLIP, to enhance PathAsst's capabilities in interpreting pathology images. Finally, we integrate PathCLIP with the Vicuna-13b and utilize pathology-specific instruction-tuning data to enhance the multimodal generation capacity of PathAsst and bolster its synergistic interactions with sub-models. The experimental results of PathAsst show the potential of harnessing AI-powered generative foundation model to improve pathology diagnosis and treatment processes. We open-source our dataset, as well as a comprehensive toolkit for extensive pathology data collection and preprocessing at https://github.com/superjamessyx/Generative-Foundation-AI-Assistant-for-Pathology.

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding

A Foundational Multimodal Vision Language AI Assistant for Human Pathology

PathGen-1.6M: 1.6 Million Pathology Image-text Pairs Generation through Multi-agent Collaboration

PathAlign: A vision-language model for whole slide images in histopathology

WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image

ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification

A Multimodal Generative AI Copilot for Human Pathology

WSI-VQA: Interpreting Whole Slide Images by Generative Visual Question Answering

Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction

WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images

Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification

PathAsst: Redefining Pathology through Generative Foundation AI Assistant for Pathology

PathAsst: A Generative Foundation AI Assistant towards Artificial General Intelligence of Pathology

PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding

Slide-based Graph Collaborative Training for Histopathology Whole Slide Image Analysis

CoD-MIL: Chain-of-Diagnosis Prompting Multiple Instance Learning for Whole Slide Image Classification.

The Rise of AI Language Pathologists: Exploring Two-level Prompt Learning for Few-shot Weakly-supervised Whole Slide Image Classification

PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning

Multimodal Whole Slide Foundation Model for Pathology

Large-vocabulary forensic pathological analyses via prototypical cross-modal contrastive learning