Comparison of Multi-Modal Large Language Models with Deep Learning Models for Medical Image Classification

J. Than,Wan-Tze Vong,Kelvin Sheng Chek Yong
DOI: https://doi.org/10.1109/ICSIPA62061.2024.10687159
2024-09-03
Abstract:In recent years, the advancement of large language models (LLMs) such as GPT-4 and Gemini has opened new avenues for artificial intelligence applications in various domains, including medical image classification. This study aims to compare the performance of multi-modal LLMs with state-of-the-art deep learning networks in classifying tumour and non-tumour images. The performance of four multi-modal LLMs and four conventional deep learning methods were evaluated using several performance measures. The results demonstrate the strengths and limitations of both approaches, providing insights into their applicability and potential integration in clinical practice. Gemini 1.5 Pro performs the best out of the eight models evaluated. This comparison underscores the evolving role of AI in enhancing diagnostic accuracy and supporting medical professionals in disease detection especially when training data is scarce.
Medicine,Computer Science
What problem does this paper attempt to address?