UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models

Quan Van Nguyen,Huy Quang Pham,Dan Quang Tran,Thang Kien-Bao Nguyen,Nhat-Hao Nguyen-Dang,Bao-Thien Nguyen-Tat

2024-05-28

Abstract:Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest BERTScore of 0.6267. This performance contributed to our team, DarkCow, achieving third place on the leaderboard. Conclusion: Our diagnostic captioning models show great promise in aiding medical professionals by generating high-quality reports efficiently. This approach can facilitate better data processing and performance optimization in medical imaging departments, ultimately benefiting healthcare delivery.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The paper mainly focuses on the efficiency of generating diagnostic captions for medical imaging. Its aim is to assist healthcare professionals in reducing clinical errors and improving productivity by automatically generating diagnostic texts from radiology images. The research team explored Transformer-based models, such as Transformer Encoder-Decoder and Query Transformer architecture, to generate diagnostic captions for radiology images in the ImageCLEFmedical 2024 challenge. The experimental results show that their VisionDiagnostor-BioBART model achieved a high score of 0.6267 on BERTScore, earning them third place in the competition. The paper first introduces the importance of generating diagnostic captions, which can help reduce errors for junior doctors and facilitate faster report generation for experienced doctors. Then, it describes in detail the process of participating in the ImageCLEFmedical Caption task, including data processing, image preprocessing methods, model design, and evaluation metrics. The experimental section demonstrates the effectiveness of the proposed models and analyzes the impact of different models, image preprocessing, and caption length on performance. Furthermore, the paper discusses the influence of image preprocessing on model performance and finds that preprocessing may decrease performance for certain models. Regarding caption length, the study reveals performance differences among models when processing longer captions, with some models performing better in this scenario. In conclusion, the proposed diagnostic caption generation model in this paper holds the potential to improve data processing efficiency and performance optimization in the medical imaging department, leading to enhanced healthcare services.

UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models

Understanding transfer learning for chest radiograph clinical report generation with modified transformer architectures

DS@BioMed at ImageCLEFmedical Caption 2024: Enhanced Attention Mechanisms in Medical Caption Generation through Concept Detection Integration

Deep neural models for automated multi-task diagnostic scan management—quality enhancement, view classification and report generation

UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering

Diagnostic Captioning: A Survey

Multi-modal transformer architecture for medical image analysis and automated report generation

Automated Radiographic Report Generation Purely on Transformer: A Multicriteria Supervised Approach

A Data-Driven Guided Decoding Mechanism for Diagnostic Captioning

Toward an enhanced automatic medical report generator based on large transformer models

Uterine Ultrasound Image Captioning Using Deep Learning Techniques

Vision Transformer and Language Model Based Radiology Report Generation

vieCap4H-VLSP 2021: Vietnamese Image Captioning for Healthcare Domain using Swin Transformer and Attention-based LSTM

DIC-Transformer: interpretation of plant disease classification results using image caption generation technology

Auxiliary signal-guided knowledge encoder-decoder for medical report generation

Use of N-acetylpsychosine as internal standard for quantitative high-performance liquid chromatographic analysis of glycosphingolipids.

FDT − Dr2T: a unified Dense Radiology Report Generation Transformer framework for X-ray images

Clinical Context-aware Radiology Report Generation from Medical Images using Transformers

Enhanced descriptive captioning model for histopathological patches

A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics

MATNet: Exploiting Multi-Modal Features for Radiology Report Generation.