Scalable quality control on processing of large diffusion-weighted and structural magnetic resonance imaging datasets

Michael E. Kim,Chenyu Gao,Karthik Ramadass,Praitayini Kanakaraj,Nancy R. Newlin,Gaurav Rudravaram,Kurt G. Schilling,Blake E. Dewey,David A. Bennett,Sid OBryant,Robert C. Barber,Derek Archer,Timothy J. Hohman,Shunxing Bao,Zhiyuan Li,Bennett A. Landman,Nazirah Mohd Khairi,Alzheimers Disease Neuroimaging Initiative,HABSHD Study Team
2024-09-26
Abstract:Proper quality control (QC) is time consuming when working with large-scale medical imaging datasets, yet necessary, as poor-quality data can lead to erroneous conclusions or poorly trained machine learning models. Most efforts to reduce data QC time rely on outlier detection, which cannot capture every instance of algorithm failure. Thus, there is a need to visually inspect every output of data processing pipelines in a scalable manner. We design a QC pipeline that allows for low time cost and effort across a team setting for a large database of diffusion weighted and structural magnetic resonance images. Our proposed method satisfies the following design criteria: 1.) a consistent way to perform and manage quality control across a team of researchers, 2.) quick visualization of preprocessed data that minimizes the effort and time spent on the QC process without compromising the condition or caliber of the QC, and 3.) a way to aggregate QC results across pipelines and datasets that can be easily shared. In addition to meeting these design criteria, we also provide information on what a successful output should be and common occurrences of algorithm failures for various processing pipelines. Our method reduces the time spent on QC by a factor of over 20 when compared to naively opening outputs in an image viewer and demonstrate how it can facilitate aggregation and sharing of QC results within a team. While researchers must spend time on robust visual QC of data, there are mechanisms by which the process can be streamlined and efficient.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
This paper aims to solve the quality control (QC) problems in the processing of large - scale diffusion - weighted and structural magnetic resonance imaging data sets. Specifically, the paper focuses on how to perform quality control efficiently and consistently when processing a large amount of medical imaging data, in order to avoid incorrect research conclusions or poor machine - learning model training caused by data quality problems. Traditional methods rely on outlier detection, but this method cannot capture all instances of algorithm failure, so a scalable method that can visually inspect the output of each data - processing pipeline is required. The paper proposes a quality - control pipeline that meets the following design criteria: 1. **Consistent approach**: A consistent way to perform and manage quality control within the research team. 2. **Rapid visualization**: Minimize the time and effort of the quality - control process without reducing the quality - control standards. 3. **Result aggregation**: Provide an easy - to - share method for aggregating quality - control results, applicable to different pipelines and data sets. In addition, the paper also provides the criteria for "successful" outputs and common algorithm - failure situations in various processing pipelines to help researchers perform quality control more effectively. Through this method, the paper demonstrates a significant improvement in time efficiency. Compared with traditional methods, the quality - control time is reduced by more than 20 times. This not only improves the efficiency of data processing, but also promotes the sharing and integration of quality - control results within the team.