Automated Quality Control in Image Segmentation: Application to the UK Biobank Cardiac MR Imaging Study
Robert Robinson,Vanya V. Valindria,Wenjia Bai,Ozan Oktay,Bernhard Kainz,Hideaki Suzuki,Mihir M. Sanghvi,Nay Aung,Jos$é$ Miguel Paiva,Filip Zemrak,Kenneth Fung,Elena Lukaschuk,Aaron M. Lee,Valentina Carapella,Young Jin Kim,Stefan K. Piechnik,Stefan Neubauer,Steffen E. Petersen,Chris Page,Paul M. Matthews,Daniel Rueckert,Ben Glocker
DOI: https://doi.org/10.48550/arXiv.1901.09351
2019-01-27
Computer Vision and Pattern Recognition
Abstract:Background: The trend towards large-scale studies including population imaging poses new challenges in terms of quality control (QC). This is a particular issue when automatic processing tools, e.g. image segmentation methods, are employed to derive quantitative measures or biomarkers for later analyses. Manual inspection and visual QC of each segmentation isn't feasible at large scale. However, it's important to be able to automatically detect when a segmentation method fails so as to avoid inclusion of wrong measurements into subsequent analyses which could lead to incorrect conclusions. Methods: To overcome this challenge, we explore an approach for predicting segmentation quality based on Reverse Classification Accuracy, which enables us to discriminate between successful and failed segmentations on a per-cases basis. We validate this approach on a new, large-scale manually-annotated set of 4,800 cardiac magnetic resonance scans. We then apply our method to a large cohort of 7,250 cardiac MRI on which we have performed manual QC. Results: We report results used for predicting segmentation quality metrics including Dice Similarity Coefficient (DSC) and surface-distance measures. As initial validation, we present data for 400 scans demonstrating 99% accuracy for classifying low and high quality segmentations using predicted DSC scores. As further validation we show high correlation between real and predicted scores and 95% classification accuracy on 4,800 scans for which manual segmentations were available. We mimic real-world application of the method on 7,250 cardiac MRI where we show good agreement between predicted quality metrics and manual visual QC scores. Conclusions: We show that RCA has the potential for accurate and fully automatic segmentation QC on a per-case basis in the context of large-scale population imaging as in the UK Biobank Imaging Study.