Abstract:Background and objective: Medical imaging techniques are widely employed in disease diagnosis and treatment. A readily available medical report can be a useful tool in assisting an expert for investigating the patient's health. A radiologist can benefit from an automatic medical image to radiological report translation system while preparing a final report. Previous attempts on automatic medical report generation task includes image captioning algorithms without taking domain-specific visual and textual contents into account, thus arises the question about credibility of generated report. Methods: In this work, a novel Adaptive Multilevel Multi-Attention (AMLMA) approach is proposed by offering domain-specific visual-textual knowledge to generate a thorough and believable radiological report for any view of a human chest X-ray image. The proposed approach leverages the encoder-decoder framework incorporated with multiple adaptive attention mechanisms. The potential of a convolutional neural network (CNN) with residual attention module (RAM) is demonstrated as a strong visual encoder for multi-label abnormality detection. The multilevel visual features (local and global) are extracted from proposed visual encoder to retrieve regional-level and abstract-level radiology-based semantic information. The Word2Vec and FastText word embeddings are trained on medical reports to acquire radiological knowledge and further used as textual encoders, feeding as input to Bi-directional Long Short Term Memory (Bi-LSTM) network to learn the co-relationship between medical terminologies in radiological reports. The AMLMA employs a weighted multilevel association of adaptive visual-semantic attention and visual-based linguistic attention mechanisms. This association of adaptive attention is exploited as a decoder and produces significant improvements in the report generation task. Results: The proposed approach is evaluated on a publicly available Indiana University chest X-ray (IU-CXR) dataset. The CNN with RAM shows the significant improvement in recall (0.4423), precision (0.1803) and F1-score (0.2551) for prediction of multiple abnormalities in X-ray image. The results of language generation metrics for proposed variants were acquired using the COCO-caption evaluation Application Program Interface (API). The trained embeddings with AMLMA model generates the convincing radiology report and outperform state-of-the-art (SOTA) approaches with high evaluation metrics scores for Bleu-4 (0.172), Meteor (0.247), Rouge_L (0.376) and CIDEr (0.381). In addition, a new "Unique Index" (UI) statistic is introduced to highlight the model's ability for generating unique reports. Conclusion: The overall architecture aids to the understanding of various X-ray image views and generating the relevant normal and abnormal radiography statements. The proposed model is emphasized on multi-level visual-textual knowledge with adaptive attention mechanism to balance visual and linguistic information for the generation of admissible radiology report.

Adversarial Training with Comprehensive Objective for Medical Image Report Generation.

Adaptively Multi-Objective Adversarial Training for Medical Image Report Generation.

An Inclusive Task-Aware Framework for Radiology Report Generation

Hierarchical medical image report adversarial generation with hybrid discriminator

Interactive dual-stream contrastive learning for radiology report generation

Multi-Grained Radiology Report Generation With Sentence-Level Image-Language Contrastive Learning

On the Automatic Generation of Medical Imaging Reports

Translating medical image to radiological report: Adaptive multilevel multi-attention approach

Generating radiology reports via auxiliary signal guidance and a memory-driven network

Multifocal region-assisted cross-modality learning for chest X-ray report generation

MATNet: Exploiting Multi-Modal Features for Radiology Report Generation.

[Research on automatic generation of multimodal medical image reports based on memory driven]

Generative Adversarial Network for Medical Images (MI-GAN)

A Self-Guided Framework for Radiology Report Generation

Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation

A medical report generation method integrating teacher–student model and encoder–decoder network

Learning to Generate Radiology Findings from Impressions Based on Large Language Model

Retrieval Augmented Chest X-Ray Report Generation using OpenAI GPT models

A Medical Semantic-Assisted Transformer for Radiographic Report Generation

Multi-modal Fusion with Semantic Supervision for Radiology Report Generation

Automatically Generating Narrative-Style Radiology Reports from Volumetric CT Images; a Proof of Concept