RabbitQCPlus 2.0: More efficient and versatile quality control for sequencing data

Lifeng Yan,Zekun Yin,Hao Zhang,Zhan Zhao,Mingkai Wang,André Müller,Felix Kallenborn,Alexander Wichmann,Yanjie Wei,Beifang Niu,Bertil Schmidt,Weiguo Liu
DOI: https://doi.org/10.1016/j.ymeth.2023.06.007
IF: 4.647
Methods
Abstract:Assessing the quality of sequencing data plays a crucial role in downstream data analysis. However, existing tools often achieve sub-optimal efficiency, especially when dealing with compressed files or performing complicated quality control operations such as over-representation analysis and error correction. We present RabbitQCPlus, an ultra-efficient quality control tool for modern multi-core systems. RabbitQCPlus uses vectorization, memory copy reduction, parallel (de)compression, and optimized data structures to achieve substantial performance gains. It is 1.1 to 5.4 times faster when performing basic quality control operations compared to state-of-the-art applications yet requires fewer compute resources. Moreover, RabbitQCPlus is at least 4 times faster than other applications when processing gzip-compressed FASTQ files and 1.3 times faster with the error correction module turned on. Furthermore, it takes less than 4 minutes to process 280 GB of plain FASTQ sequencing data, while other applications take at least 22 minutes on a 48-core server when enabling the per-read over-representation analysis. C++ sources are available at https://github.com/RabbitBio/RabbitQCPlus.
What problem does this paper attempt to address?