A Multimodal Generative AI Copilot for Human Pathology

Ming Y. Lu,Bowen Chen,Drew F. K. Williamson,Richard J. Chen,Melissa Zhao,Aaron K. Chow,Kenji Ikemura,Ahrong Kim,Dimitra Pouli,Ankush Patel,Amr Soliman,Chengkuan Chen,Tong Ding,Judy J. Wang,Georg Gerber,Ivy Liang,Long Phi Le,Anil V. Parwani,Luca L. Weishaupt,Faisal Mahmood
DOI: https://doi.org/10.1038/s41586-024-07618-3
IF: 64.8
2024-06-13
Nature
Abstract:The field of computational pathology[1,2] has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders[3,4]. However, despite the explosive growth of generative artificial intelligence (AI), there has been limited study on building general purpose, multimodal AI assistants and copilots[5] tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology. We build PathChat by adapting a foundational vision encoder for pathology, combining it with a pretrained large language model and finetuning the whole system on over 456,000 diverse visual language instructions consisting of 999,202 question-answer turns. We compare PathChat against several multimodal vision language AI assistants and GPT4V, which powers the commercially available multimodal general purpose AI assistant ChatGPT-4[7]. PathChat achieved state-of-the-art performance on multiple-choice diagnostic questions from cases of diverse tissue origins and disease models. Furthermore, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive and general vision-language AI Copilot that can flexibly handle both visual and natural language inputs, PathChat can potentially find impactful applications in pathology education, research, and human-in-the-loop clinical decision making.
multidisciplinary sciences
What problem does this paper attempt to address?
The paper aims to address the issue of multimodal generative AI assistants in the field of computational pathology. Specifically, the authors developed a visual language general AI assistant named PathChat for human pathology. Despite significant progress in computational pathology with task-specific predictive models and self-supervised visual encoders, there is still a lack of a general multimodal AI assistant tailored for the field of pathology. PathChat adapts a foundational visual encoder and combines it with a pre-trained large-scale language model, then fine-tunes the entire system to handle over 456,000 diverse visual language instructions, encompassing 999,202 question-answer rounds. The research results show that PathChat achieved state-of-the-art performance on diagnostic multiple-choice questions across various tissue sources and disease models. Additionally, in open-ended questions and human expert evaluations, PathChat generated more accurate and pathologist-appropriate answers to pathology-related queries. Overall, PathChat, as an interactive general visual language assistant, can flexibly handle both visual and natural language inputs, offering potential applications in pathology education, research, and clinical decision support.