Label Critic: Design Data Before Models

Pedro R. A. S. Bassi,Qilong Wu,Wenxuan Li,Sergio Decherchi,Andrea Cavalli,Alan Yuille,Zongwei Zhou
2024-11-05
Abstract:As medical datasets rapidly expand, creating detailed annotations of different body structures becomes increasingly expensive and time-consuming. We consider that requesting radiologists to create detailed annotations is unnecessarily burdensome and that pre-existing AI models can largely automate this process. Following the spirit don't use a sledgehammer on a nut, we find that, rather than creating annotations from scratch, radiologists only have to review and edit errors if the Best-AI Labels have mistakes. To obtain the Best-AI Labels among multiple AI Labels, we developed an automatic tool, called Label Critic, that can assess label quality through tireless pairwise comparisons. Extensive experiments demonstrate that, when incorporated with our developed Image-Prompt pairs, pre-existing Large Vision-Language Models (LVLM), trained on natural images and texts, achieve 96.5% accuracy when choosing the best label in a pair-wise comparison, without extra fine-tuning. By transforming the manual annotation task (30-60 min/scan) into an automatic comparison task (15 sec/scan), we effectively reduce the manual efforts required from radiologists by an order of magnitude. When the Best-AI Labels are sufficiently accurate (81% depending on body structures), they will be directly adopted as the gold-standard annotations for the dataset, with lower-quality AI Labels automatically discarded. Label Critic can also check the label quality of a single AI Label with 71.8% accuracy when no alternatives are available for comparison, prompting radiologists to review and edit if the estimated quality is low (19% depending on body structures).
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: with the rapid expansion of medical datasets, creating detailed annotations (especially annotations of different body structures) has become increasingly expensive and time - consuming. The author believes that it is an unnecessary burden to require radiologists to create these detailed annotations from scratch, and existing AI models can largely automate this process. Specifically, the paper raises the following problems and attempts to solve them: 1. **Reducing the workload of manual annotation**: How to use pre - trained large vision - language models (LVLMs) to transform manual annotation tasks into automatic comparison tasks, thereby greatly reducing the time and effort that radiologists need to spend. 2. **Improving the quality of annotation**: How to ensure that in large - scale datasets, the quality of annotation remains at a high level, especially in cases involving multi - organ segmentation. 3. **Automating error detection and selecting the best annotation**: How to use AI tools to automatically detect errors in annotations and select the optimal one among multiple AI - generated annotations, so as to reduce the need for manual review. To this end, the author has developed a tool named **Label Critic**, which can evaluate the quality of annotations through pairwise comparison and achieve an accuracy of up to 96.5% in selecting the best annotation without the need for additional fine - tuning. In addition, Label Critic can also check the quality of a single AI annotation with an accuracy of 71.8% and prompt radiologists to review and edit when the estimated quality is low. Through this method, the paper effectively shortens the originally 30 - 60 minutes of manual annotation tasks to an automatic comparison task of only 15 seconds per scan, thereby reducing the required human effort by an order of magnitude.