UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models

Quan Van Nguyen,Huy Quang Pham,Dan Quang Tran,Thang Kien-Bao Nguyen,Nhat-Hao Nguyen-Dang,Bao-Thien Nguyen-Tat
2024-05-28
Abstract:Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest BERTScore of 0.6267. This performance contributed to our team, DarkCow, achieving third place on the leaderboard. Conclusion: Our diagnostic captioning models show great promise in aiding medical professionals by generating high-quality reports efficiently. This approach can facilitate better data processing and performance optimization in medical imaging departments, ultimately benefiting healthcare delivery.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper mainly focuses on the efficiency of generating diagnostic captions for medical imaging. Its aim is to assist healthcare professionals in reducing clinical errors and improving productivity by automatically generating diagnostic texts from radiology images. The research team explored Transformer-based models, such as Transformer Encoder-Decoder and Query Transformer architecture, to generate diagnostic captions for radiology images in the ImageCLEFmedical 2024 challenge. The experimental results show that their VisionDiagnostor-BioBART model achieved a high score of 0.6267 on BERTScore, earning them third place in the competition. The paper first introduces the importance of generating diagnostic captions, which can help reduce errors for junior doctors and facilitate faster report generation for experienced doctors. Then, it describes in detail the process of participating in the ImageCLEFmedical Caption task, including data processing, image preprocessing methods, model design, and evaluation metrics. The experimental section demonstrates the effectiveness of the proposed models and analyzes the impact of different models, image preprocessing, and caption length on performance. Furthermore, the paper discusses the influence of image preprocessing on model performance and finds that preprocessing may decrease performance for certain models. Regarding caption length, the study reveals performance differences among models when processing longer captions, with some models performing better in this scenario. In conclusion, the proposed diagnostic caption generation model in this paper holds the potential to improve data processing efficiency and performance optimization in the medical imaging department, leading to enhanced healthcare services.