Abstract:The rapidly evolving sector of Multi-modal Large Language Models (MLLMs) is at the forefront of integrating linguistic and visual processing in artificial intelligence. This paper presents an in-depth comparative study of two pioneering models: Google's Gemini and OpenAI's GPT-4V(ision). Our study involves a multi-faceted evaluation of both models across key dimensions such as Vision-Language Capability, Interaction with Humans, Temporal Understanding, and assessments in both Intelligence and Emotional Quotients. The core of our analysis delves into the distinct visual comprehension abilities of each model. We conducted a series of structured experiments to evaluate their performance in various industrial application scenarios, offering a comprehensive perspective on their practical utility. We not only involve direct performance comparisons but also include adjustments in prompts and scenarios to ensure a balanced and fair analysis. Our findings illuminate the unique strengths and niches of both models. GPT-4V distinguishes itself with its precision and succinctness in responses, while Gemini excels in providing detailed, expansive answers accompanied by relevant imagery and links. These understandings not only shed light on the comparative merits of Gemini and GPT-4V but also underscore the evolving landscape of multimodal foundation models, paving the way for future advancements in this area. After the comparison, we attempted to achieve better results by combining the two models. Finally, We would like to express our profound gratitude to the teams behind GPT-4V and Gemini for their pioneering contributions to the field. Our acknowledgments are also extended to the comprehensive qualitative analysis presented in 'Dawn' by Yang et al. This work, with its extensive collection of image samples, prompts, and GPT-4V-related results, provided a foundational basis for our analysis.

Comparison of Multi-Modal Large Language Models with Deep Learning Models for Medical Image Classification

Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

Multimodal Large Language Models are Generalist Medical Image Interpreters

An Early Investigation into the Utility of Multimodal Large Language Models in Medical Imaging

Evaluating Large Language Model (LLM) Performance on Established Breast Classification Systems

Evaluating LLM -- Generated Multimodal Diagnosis from Medical Images and Symptom Analysis

Gemini Goes to Med School: Exploring the Capabilities of Multimodal Large Language Models on Medical Challenge Problems & Hallucinations

A comparison of the diagnostic ability of large language models in challenging clinical cases

Evaluating the Diagnostic Performance of Large Language Models in Identifying Complex Multisystemic Syndromes: A Comparative Study with Radiology Residents

How do large language models answer breast cancer quiz questions? A comparative study of GPT-3.5, GPT-4 and Google Gemini

Evaluating General Vision-Language Models for Clinical Medicine

Multi-modal large language models in radiology: principles, applications, and potential

Evaluating AI Proficiency in Nuclear Cardiology: Large Language Models take on the Board Preparation Exam

Comparative Analysis of GPT-4Vision, GPT-4 and Open Source LLMs in Clinical Diagnostic Accuracy: A Benchmark Against Human Expertise

Nimg-67. The Current State Of Large Language Models And Neuro-Oncology Imaging

Pre-trained multimodal large language model enhances dermatological diagnosis using SkinGPT-4

Digital Diagnostics: The Potential Of Large Language Models In Recognizing Symptoms Of Common Illnesses

Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

Visual-Textual Integration in LLMs for Medical Diagnosis: A Quantitative Analysis

A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine