Abstract:Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence (AGI) for computer vision, showcasing their potential in the biomedical domain. In this study, we evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets, including 5 medical imaging categories (dermatology, radiology, dentistry, ophthalmology, and endoscopy), and 3 radiology report datasets. The investigated tasks encompass disease classification, lesion segmentation, anatomical localization, disease diagnosis, report generation, and lesion detection. Our experimental results demonstrated that Gemini-series models excelled in report generation and lesion detection but faces challenges in disease classification and anatomical localization. Conversely, GPT-series models exhibited proficiency in lesion segmentation and anatomical localization but encountered difficulties in disease diagnosis and lesion detection. Additionally, both the Gemini series and GPT series contain models that have demonstrated commendable generation efficiency. While both models hold promise in reducing physician workload, alleviating pressure on limited healthcare resources, and fostering collaboration between clinical practitioners and artificial intelligence technologies, substantial enhancements and comprehensive validations remain imperative before clinical deployment.

Multi-modal large language models in radiology: principles, applications, and potential

From Text to Multimodality: Exploring the Evolution and Impact of Large Language Models in Medical Practice

Large Language Models: A Guide for Radiologists

Evaluating Large Language Models for Radiology Natural Language Processing

Large language models in radiology: fundamentals, applications, ethical considerations, risks, and future directions

Advancing radiology practice and research: harnessing the potential of large language models amidst imperfections

Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine

Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review

A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

A Survey for Large Language Models in Biomedicine

Large Language Models Illuminate a Progressive Pathway to Artificial Intelligent Healthcare Assistant

Understanding natural language: Potential application of large language models to ophthalmology

Large language models for structured reporting in radiology: past, present, and future

From Bench to Bedside With Large Language Models: AJR Expert Panel Narrative Review

Large Language Models in Medicine: The Potentials and Pitfalls

Evaluating large language models in medical applications: a survey

Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook

Large Language Models in Biomedical and Health Informatics: A Review with Bibliometric Analysis

A Review of Multi-Modal Large Language and Vision Models

Exploring Multimodal Large Language Models for Radiology Report Error-checking