Comparison of Vision Transformers and Convolutional Neural Networks in Medical Image Analysis: A Systematic Review
Satoshi Takahashi,Yusuke Sakaguchi,Nobuji Kouno,Ken Takasawa,Kenichi Ishizu,Yu Akagi,Rina Aoyama,Naoki Teraya,Amina Bolatkan,Norio Shinkai,Hidenori Machino,Kazuma Kobayashi,Ken Asada,Masaaki Komatsu,Syuzo Kaneko,Masashi Sugiyama,Ryuji Hamamoto
DOI: https://doi.org/10.1007/s10916-024-02105-8
IF: 4.92
2024-09-13
Journal of Medical Systems
Abstract:In the rapidly evolving field of medical image analysis utilizing artificial intelligence (AI), the selection of appropriate computational models is critical for accurate diagnosis and patient care. This literature review provides a comprehensive comparison of vision transformers (ViTs) and convolutional neural networks (CNNs), the two leading techniques in the field of deep learning in medical imaging. We conducted a survey systematically. Particular attention was given to the robustness, computational efficiency, scalability, and accuracy of these models in handling complex medical datasets. The review incorporates findings from 36 studies and indicates a collective trend that transformer-based models, particularly ViTs, exhibit significant potential in diverse medical imaging tasks, showcasing superior performance when contrasted with conventional CNN models. Additionally, it is evident that pre-training is important for transformer applications. We expect this work to help researchers and practitioners select the most appropriate model for specific medical image analysis tasks, accounting for the current state of the art and future trends in the field.
health care sciences & services,medical informatics