Abstract:Abstract The detailed physiological perspectives captured by medical imaging provides actionable insights to doctors to manage comprehensive care of patients. However, the quality of such diagnostic image modalities is often affected by mismanagement of the image capturing process by poorly trained technicians and older/poorly maintained imaging equipment. Further, a patient is often subjected to scanning at different orientations to capture the frontal, lateral and sagittal views of the affected areas. Due to the large volume of diagnostic scans performed at a modern hospital, adequate documentation of such additional perspectives is mostly overlooked, which is also an essential key element of quality diagnostic systems and predictive analytics systems. Another crucial challenge affecting effective medical image data management is that the diagnostic scans are essentially stored as unstructured data, lacking a well-defined processing methodology for enabling intelligent image data management for supporting applications like similar patient retrieval , automated disease prediction etc. One solution is to incorporate automated diagnostic image descriptions of the observation/findings by leveraging computer vision and natural language processing. In this work, we present multi-task neural models capable of addressing these critical challenges. We propose ESRGAN, an image enhancement technique for improving the quality and visualization of medical chest x-ray images, thereby substantially improving the potential for accurate diagnosis, automatic detection and region-of-interest segmentation. We also propose a CNN-based model called ViewNet for predicting the view orientation of the x-ray image and generating a medical report using Xception net, thus facilitating a robust medical image management system for intelligent diagnosis applications. Experimental results are demonstrated using standard metrics like BRISQUE, PIQE and BLEU scores, indicating that the proposed models achieved excellent performance. Further, the proposed deep learning approaches enable diagnosis in a lesser time and their hybrid architecture shows significant potential for supporting many intelligent diagnosis applications.

Efficient Document Image Classification Using Region-Based Graph Neural Network

Knowledge-based Document Embedding for Cross-Domain Text Classification

CNN Based Page Object Detection in Document Images

Grading of Diabetic Retinopathy Images Based on Graph Neural Network.

Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks

Key-Guided Identity Document Classification Method by Graph Attention Network

Evaluation of Deep Convolutional Nets for Document Image Classification and Retrieval

Deep neural models for automated multi-task diagnostic scan management—quality enhancement, view classification and report generation

High-Resolution Image Classification with Rich Text Information Based on Graph Convolution Neural Network

Analysis of Convolutional Neural Networks for Document Image Classification

Efficient Region-Based Image Querying

Fusing Global Domain Information and Local Semantic Information to Classify Financial Documents

GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification

Real-Time Document Image Classification using Deep CNN and Extreme Learning Machines

Multimodal Pre-Training Based on Graph Attention Network for Document Understanding

Effective Document Image Rectification via a Deep Learning Framework

A Fast Fully Octave Convolutional Neural Network for Document Image Segmentation

Dating Documents using Graph Convolution Networks

Document AI: A Comparative Study of Transformer-Based, Graph-Based Models, and Convolutional Neural Networks For Document Layout Analysis

Doc2Im: document to image conversion through self-attentive embedding

DUBLIN -- Document Understanding By Language-Image Network