A Disease Labeler for Chinese Chest X-Ray Report Generation

Mengwei Wang,Ruixin Yan,Zeyi Hou,Ning Lang,Xiuzhuang Zhou
2024-03-18
Abstract:In the field of medical image analysis, the scarcity of Chinese chest X-ray report datasets has hindered the development of technology for generating Chinese chest X-ray reports. On one hand, the construction of a Chinese chest X-ray report dataset is limited by the time-consuming and costly process of accurate expert disease annotation. On the other hand, a single natural language generation metric is commonly used to evaluate the similarity between generated and ground-truth reports, while the clinical accuracy and effectiveness of the generated reports rely on an accurate disease labeler (classifier). To address the issues, this study proposes a disease labeler tailored for the generation of Chinese chest X-ray reports. This labeler leverages a dual BERT architecture to handle diagnostic reports and clinical information separately and constructs a hierarchical label learning algorithm based on the affiliation between diseases and body parts to enhance text classification performance. Utilizing this disease labeler, a Chinese chest X-ray report dataset comprising 51,262 report samples was established. Finally, experiments and analyses were conducted on a subset of expert-annotated Chinese chest X-ray reports, validating the effectiveness of the proposed disease labeler.
Machine Learning,Artificial Intelligence,Computation and Language,Image and Video Processing
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are as follows: 1. **Scarcity of Chinese chest X - ray report datasets**: In the field of medical image analysis, datasets of Chinese chest X - ray reports are very scarce, which hinders the technological development of generating Chinese chest X - ray reports. Constructing Chinese chest X - ray report datasets is restricted by the time - consuming and high - cost accurate disease - label annotation by experts. 2. **Evaluation problems of generated reports**: Existing evaluation methods mainly rely on natural language generation (NLG) metrics to measure the similarity between generated reports and real reports, but these metrics cannot evaluate the accuracy of disease prediction in generated reports. In order to evaluate the disease - prediction effect in generated reports, an accurate disease - labeler (classifier) is required. To solve these problems, the paper proposes a disease - labeler specifically designed for generating Chinese chest X - ray reports. This labeler adopts a dual - BERT architecture to process diagnostic reports and clinical information respectively, and constructs a hierarchical label - learning algorithm based on the associations between diseases and body parts to improve text - classification performance. Using this disease - labeler, the researchers established a Chinese chest X - ray report dataset (CCXRD) containing 51,262 report samples. Finally, through experiments and analyses on a subset of expert - annotated Chinese chest X - ray reports, the effectiveness of the proposed disease - labeler was verified.