RabbitTrim: Highly Optimized Trimming of Illumina Sequencing Data on Multi-core Platforms

Mingkai Wang,Zekun Yin,Lifeng Yan,Yang,Fangjin Zhu,Xiaohui Duan,Xin Li,Bertil Schmidt,Weiguo Liu
DOI: https://doi.org/10.1007/978-981-97-5131-0_3
2024-01-01
Abstract:Trimmomatic is a de-facto standard trimmer for Illumina sequencing data. However, limited by its sub-optimal implementation, it cannot fully exploit the computational power of common multi-core platforms. Therefore, we propose RabbitTrim, a highly optimized implementation of Trimmomatic based on efficient I/O strategies, parallel (de)compression engines, block-based memory pools, bitwise operations and vectorization techniques. RabbitTrim achieves speedups between 1.5x and 3.3x (3.7x and 8.0x) when processing plain (gzip-compressed) FASTQ files on a 48-core Intel server. Overall, RabbitTrim is able to process 101 GB gzip-compressed sequencing data in only 5 min while Trimmomatic requires at least 21 min. The source code is available at https://github.com/RabbitBio/RabbitTrim.
What problem does this paper attempt to address?