Abstract:Background and objective: Medical imaging techniques are widely employed in disease diagnosis and treatment. A readily available medical report can be a useful tool in assisting an expert for investigating the patient's health. A radiologist can benefit from an automatic medical image to radiological report translation system while preparing a final report. Previous attempts on automatic medical report generation task includes image captioning algorithms without taking domain-specific visual and textual contents into account, thus arises the question about credibility of generated report. Methods: In this work, a novel Adaptive Multilevel Multi-Attention (AMLMA) approach is proposed by offering domain-specific visual-textual knowledge to generate a thorough and believable radiological report for any view of a human chest X-ray image. The proposed approach leverages the encoder-decoder framework incorporated with multiple adaptive attention mechanisms. The potential of a convolutional neural network (CNN) with residual attention module (RAM) is demonstrated as a strong visual encoder for multi-label abnormality detection. The multilevel visual features (local and global) are extracted from proposed visual encoder to retrieve regional-level and abstract-level radiology-based semantic information. The Word2Vec and FastText word embeddings are trained on medical reports to acquire radiological knowledge and further used as textual encoders, feeding as input to Bi-directional Long Short Term Memory (Bi-LSTM) network to learn the co-relationship between medical terminologies in radiological reports. The AMLMA employs a weighted multilevel association of adaptive visual-semantic attention and visual-based linguistic attention mechanisms. This association of adaptive attention is exploited as a decoder and produces significant improvements in the report generation task. Results: The proposed approach is evaluated on a publicly available Indiana University chest X-ray (IU-CXR) dataset. The CNN with RAM shows the significant improvement in recall (0.4423), precision (0.1803) and F1-score (0.2551) for prediction of multiple abnormalities in X-ray image. The results of language generation metrics for proposed variants were acquired using the COCO-caption evaluation Application Program Interface (API). The trained embeddings with AMLMA model generates the convincing radiology report and outperform state-of-the-art (SOTA) approaches with high evaluation metrics scores for Bleu-4 (0.172), Meteor (0.247), Rouge_L (0.376) and CIDEr (0.381). In addition, a new "Unique Index" (UI) statistic is introduced to highlight the model's ability for generating unique reports. Conclusion: The overall architecture aids to the understanding of various X-ray image views and generating the relevant normal and abnormal radiography statements. The proposed model is emphasized on multi-level visual-textual knowledge with adaptive attention mechanism to balance visual and linguistic information for the generation of admissible radiology report.

Multi-Level objective Alignment Transformer for Fine-Grained Oral Panoramic X-ray Report Generation

Automatic Report Generation Method Based on Multiscale Feature Extraction and Word Attention Network.

LETA: Tooth Alignment Prediction Based on Dual-branch Latent Encoding

AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation

Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation

Eye Gaze Guided Cross-Modal Alignment Network for Radiology Report Generation.

Automated Radiographic Report Generation Purely on Transformer: A Multicriteria Supervised Approach

MATNet: Exploiting Multi-Modal Features for Radiology Report Generation.

Teeth Mold Point Cloud Completion Via Data Augmentation and Hybrid RL-GAN.

Large Language Model with Region-guided Referring and Grounding for CT Report Generation

Translating medical image to radiological report: Adaptive multilevel multi-attention approach

Towards a Holistic Framework for Multimodal Large Language Models in Three-dimensional Brain CT Report Generation

Automatic Radiology Reports Generation via Memory Alignment Network

Radiology Report Generation with a Learned Knowledge Base and Multi-Modal Alignment

DKA-RG: Disease-Knowledge-Enhanced Fine-Grained Image–Text Alignment for Automatic Radiology Report Generation

YOLOrtho -- A Unified Framework for Teeth Enumeration and Dental Disease Detection

DPML: Prior-Guided Multitask Learning for Dental Object Recognition on Limited Panoramic Radiograph Dataset

Visual prior-based cross-modal alignment network for radiology report generation

Multifocal region-assisted cross-modality learning for chest X-ray report generation

Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation

LCAT-Net: Lightweight Context-Aware Deep Learning Approach for Teeth Segmentation in Panoramic X-rays