Expert-Level Annotation Quality Achieved by Gamified Crowdsourcing for B-line Segmentation in Lung Ultrasound

Mike Jin,Nicole M Duggan,Varoon Bashyakarla,Maria Alejandra Duran Mendicuti,Stephen Hallisey,Denie Bernier,Joseph Stegeman,Erik Duhaime,Tina Kapur,Andrew J Goldsmith
2023-12-16
Abstract:Accurate and scalable annotation of medical data is critical for the development of medical AI, but obtaining time for annotation from medical experts is challenging. Gamified crowdsourcing has demonstrated potential for obtaining highly accurate annotations for medical data at scale, and we demonstrate the same in this study for the segmentation of B-lines, an indicator of pulmonary congestion, on still frames within point-of-care lung ultrasound clips. We collected 21,154 annotations from 214 annotators over 2.5 days, and we demonstrated that the concordance of crowd consensus segmentations with reference standards exceeds that of individual experts with the same reference standards, both in terms of B-line count (mean squared error 0.239 vs. 0.308, p<0.05) as well as the spatial precision of B-line annotations (mean Dice-H score 0.755 vs. 0.643, p<0.05). These results suggest that expert-quality segmentations can be achieved using gamified crowdsourcing.
Computers and Society
What problem does this paper attempt to address?