Label Critic: Design Data Before Models

Pedro R. A. S. Bassi,Qilong Wu,Wenxuan Li,Sergio Decherchi,Andrea Cavalli,Alan Yuille,Zongwei Zhou

2024-11-05

Abstract:As medical datasets rapidly expand, creating detailed annotations of different body structures becomes increasingly expensive and time-consuming. We consider that requesting radiologists to create detailed annotations is unnecessarily burdensome and that pre-existing AI models can largely automate this process. Following the spirit don't use a sledgehammer on a nut, we find that, rather than creating annotations from scratch, radiologists only have to review and edit errors if the Best-AI Labels have mistakes. To obtain the Best-AI Labels among multiple AI Labels, we developed an automatic tool, called Label Critic, that can assess label quality through tireless pairwise comparisons. Extensive experiments demonstrate that, when incorporated with our developed Image-Prompt pairs, pre-existing Large Vision-Language Models (LVLM), trained on natural images and texts, achieve 96.5% accuracy when choosing the best label in a pair-wise comparison, without extra fine-tuning. By transforming the manual annotation task (30-60 min/scan) into an automatic comparison task (15 sec/scan), we effectively reduce the manual efforts required from radiologists by an order of magnitude. When the Best-AI Labels are sufficiently accurate (81% depending on body structures), they will be directly adopted as the gold-standard annotations for the dataset, with lower-quality AI Labels automatically discarded. Label Critic can also check the label quality of a single AI Label with 71.8% accuracy when no alternatives are available for comparison, prompting radiologists to review and edit if the estimated quality is low (19% depending on body structures).

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: with the rapid expansion of medical datasets, creating detailed annotations (especially annotations of different body structures) has become increasingly expensive and time - consuming. The author believes that it is an unnecessary burden to require radiologists to create these detailed annotations from scratch, and existing AI models can largely automate this process. Specifically, the paper raises the following problems and attempts to solve them: 1. **Reducing the workload of manual annotation**: How to use pre - trained large vision - language models (LVLMs) to transform manual annotation tasks into automatic comparison tasks, thereby greatly reducing the time and effort that radiologists need to spend. 2. **Improving the quality of annotation**: How to ensure that in large - scale datasets, the quality of annotation remains at a high level, especially in cases involving multi - organ segmentation. 3. **Automating error detection and selecting the best annotation**: How to use AI tools to automatically detect errors in annotations and select the optimal one among multiple AI - generated annotations, so as to reduce the need for manual review. To this end, the author has developed a tool named **Label Critic**, which can evaluate the quality of annotations through pairwise comparison and achieve an accuracy of up to 96.5% in selecting the best annotation without the need for additional fine - tuning. In addition, Label Critic can also check the quality of a single AI annotation with an accuracy of 71.8% and prompt radiologists to review and edit when the estimated quality is low. Through this method, the paper effectively shortens the originally 30 - 60 minutes of manual annotation tasks to an automatic comparison task of only 15 seconds per scan, thereby reducing the required human effort by an order of magnitude.

Label Critic: Design Data Before Models

Accelerating voxelwise annotation of cross-sectional imaging through AI collaborative labeling with quality assurance and bias mitigation

Quality Sentinel: Estimating Label Quality and Errors in Medical Segmentation Datasets

Learning Image Labels On-the-fly for Training Robust Classification Models

Automatic Labels are as Effective as Manual Labels in Biomedical Images Classification with Deep Learning

Label Smarter, Not Harder: CleverLabel for Faster Annotation of Ambiguous Image Classification with Higher Quality

Automated image label extraction from radiology reports — A review

Effect of Radiology Report Labeler Quality on Deep Learning Models for Chest X-Ray Interpretation

Active label cleaning for improved dataset quality under resource constraints

Biological data annotation via a human-augmenting AI-based labeling system

Chest x-ray automated triage: a semiologic approach designed for clinical implementation, exploiting different types of labels through a combination of four Deep Learning architectures

Automated labelling of radiology reports using natural language processing: Comparison of traditional and newer methods

MONAI Label: A framework for AI-assisted interactive labeling of 3D medical images

Deep learning to automate the labelling of head MRI datasets for computer vision applications

H-COAL: Human Correction of AI-Generated Labels for Biomedical Named Entity Recognition

LabelAId: Just-in-time AI Interventions for Improving Human Labeling Quality and Domain Knowledge in Crowdsourcing Systems

Multi-label annotation of text reports from computed tomography of the chest, abdomen, and pelvis using deep learning

Evaluating the Performance and Bias of Natural Language Processing Tools in Labeling Chest Radiograph Reports

Iterative Refinement Strategy for Automated Data Labeling: Facial Landmark Diagnosis in Medical Imaging

Designing a computer-assisted diagnosis system for cardiomegaly detection and radiology report generation

Performance and Agreement When Annotating Chest X-ray Text Reports-A Preliminary Step in the Development of a Deep Learning-Based Prioritization and Detection System