Learning Generalized Medical Image Representations through Image-Graph Contrastive Pretraining

Sameer Khanna,Daniel Michael,Marinka Zitnik,Pranav Rajpurkar
2024-05-15
Abstract:Medical image interpretation using deep learning has shown promise but often requires extensive expert-annotated datasets. To reduce this annotation burden, we develop an Image-Graph Contrastive Learning framework that pairs chest X-rays with structured report knowledge graphs automatically extracted from radiology notes. Our approach uniquely encodes the disconnected graph components via a relational graph convolution network and transformer attention. In experiments on the CheXpert dataset, this novel graph encoding strategy enabled the framework to outperform existing methods that use image-text contrastive learning in 1% linear evaluation and few-shot settings, while achieving comparable performance to radiologists. By exploiting unlabeled paired images and text, our framework demonstrates the potential of structured clinical insights to enhance contrastive learning for medical images. This work points toward reducing demands on medical experts for annotations, improving diagnostic precision, and advancing patient care through robust medical image understanding.
Image and Video Processing,Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The paper primarily addresses the issue of the need for a large amount of expert-annotated data in medical image analysis and proposes a new method to reduce this requirement. The problem addressed by the paper is that, although deep learning has shown great potential in medical image interpretation, it usually requires large expert-annotated datasets for training. The manual annotation process is very time-consuming and resource-intensive, especially when dealing with tens of thousands of images. Therefore, the goal of the paper is to develop a method that can train powerful deep learning systems with less manual annotation data, thereby reducing the annotation burden from hundreds of thousands to thousands of images. To address the above problem, the research team proposed a new framework called "Image-Graph Contrastive Learning" (IGCL). This framework combines chest X-ray images with structured knowledge graphs automatically extracted from radiology reports and uses this combination to train the model. Specifically, the method employs a special encoding strategy, namely the Relational Graph Convolutional Network (RGCN) and transformer attention mechanism, to effectively encode the disconnected components in the knowledge graph. Experimental results show that on the CheXpert dataset, IGCL outperforms existing image-text contrastive learning methods with a small amount of annotated data, and its performance is comparable to that of radiologists. In summary, the main contribution of the paper is the proposal of a method that leverages structured clinical insights to enhance contrastive learning, which helps reduce the need for medical expert annotations, improves diagnostic accuracy, and promotes patient care through powerful medical image understanding capabilities.