More Than Positive and Negative: Communicating Fine Granularity in Medical Diagnosis

Xiangyu Peng,Kai Wang,Jianfei Yang,Yingying Zhu,Yang You
2024-08-05
Abstract:With the advance of deep learning, much progress has been made in building powerful artificial intelligence (AI) systems for automatic Chest X-ray (CXR) analysis. Most existing AI models are trained to be a binary classifier with the aim of distinguishing positive and negative cases. However, a large gap exists between the simple binary setting and complicated real-world medical scenarios. In this work, we reinvestigate the problem of automatic radiology diagnosis. We first observe that there is considerable diversity among cases within the positive class, which means simply classifying them as positive loses many important details. This motivates us to build AI models that can communicate fine-grained knowledge from medical images like human experts. To this end, we first propose a new benchmark on fine granularity learning from medical images. Specifically, we devise a division rule based on medical knowledge to divide positive cases into two subcategories, namely atypical positive and typical positive. Then, we propose a new metric termed AUC$^\text{FG}$ on the two subcategories for evaluation of the ability to separate them apart. With the proposed benchmark, we encourage the community to develop AI diagnosis systems that could better learn fine granularity from medical images. Last, we propose a simple risk modulation approach to this problem by only using coarse labels in training. Empirical results show that despite its simplicity, the proposed method achieves superior performance and thus serves as a strong baseline.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: Existing artificial intelligence (AI) models for automatic chest X - ray (CXR) analysis rely too much on simple binary classification methods (i.e., distinguishing positive from negative) in medical diagnosis. This causes the models to be unable to capture the subtle differences within positive cases, thus affecting the accuracy and practicality of the models in complex real - world medical scenarios. Specifically, the author points out: 1. **Limitations of existing models**: - Most existing AI models are trained as binary classifiers, aiming to distinguish between positive and negative cases. However, there is a large gap between this simple binary classification setting and complex real - world medical scenarios. - There is significant diversity within positive cases, and simply classifying them as positive will lose a lot of important detailed information. 2. **The core of the problem**: - The author observes that among positive cases, some cases have very mild symptoms (such as "mild pulmonary effusion"), while others are very severe (such as "severe pulmonary consolidation"). In addition, the situation of some cases has improved (such as "significant improvement in pulmonary consolidation in the right upper lobe"), while that of some has deteriorated (such as "significant increase in left lung atelectasis"). - Ignoring these differences may lead to inconsistency between the model and the evaluation criteria of human experts, especially when dealing with borderline cases, and misjudgments are likely to occur. 3. **New problems proposed**: - How to construct AI models that can convey fine - grained knowledge from medical images to better reflect the actual clinical situation? - How to design a new benchmark and evaluation metric to more accurately evaluate the model's ability to capture the differences within positive cases? To this end, the author proposes the following solutions: - **New benchmark and evaluation metric**: - A division rule based on medical knowledge is proposed to further divide positive cases into two sub - categories: atypical positive and typical positive. - A new evaluation metric AUCFG (Area Under Curve for Fine Granularity) is designed to evaluate the model's ability to distinguish between atypical positive and typical positive cases. - **Risk modulation method**: - A simple but effective method is proposed. By performing risk modulation (PCE loss) on the cross - entropy loss function, over - fitting of the model to difficult samples is reduced, thereby improving the model's ability to learn the differences within positive cases. Through these improvements, the author hopes to promote the community to develop more refined AI diagnosis systems that are more in line with clinical needs, so as to better serve actual medical applications.