Accelerating spliced alignment of long RNA sequencing reads using parallel maximal exact match retrieval
Rongxing Wang,Yanju Zhang
DOI: https://doi.org/10.1016/j.compbiomed.2024.108542
IF: 7.7
2024-05-04
Computers in Biology and Medicine
Abstract:The genomics landscape has undergone a revolutionary transformation with the emergence of third-generation sequencing technologies. Fueled by the exponential surge in sequencing data, there is an urgent demand for accurate and rapid algorithms to effectively handle this burgeoning influx. Under such circumstances, we developed a parallelized, yet accuracy-lossless algorithm for maximal exact match (MEM) retrieval to strategically address the computational bottleneck of uLTRA, a leading spliced alignment algorithm known for its precision in handling long RNA sequencing (RNA-seq) reads. The design of the algorithm incorporates a multi-threaded strategy, enabling the concurrent processing of multiple reads simultaneously. Additionally, we implemented the serialization of index required for MEM retrieval to facilitate its reuse, resulting in accelerated startup for practical tasks. Extensive experiments demonstrate that our parallel algorithm achieves significant improvements in runtime, speedup, throughput, and memory usage. When applied to the largest human dataset, the algorithm achieves an impressive speedup of 10.78×, significantly improving throughput on a large scale. Moreover, the integration of the parallel MEM retrieval algorithm into the uLTRA pipeline introduces a dual-layered parallel capability, consistently yielding a speedup of 4.99× compared to the multi-process and single-threaded execution of uLTRA. The thorough analysis of experimental results underscores the adept utilization of parallel processing capabilities and its advantageous performance in handling large datasets. This study provides a showcase of parallelized strategies for MEM retrieval within the context of spliced alignment algorithm, effectively facilitating the process of RNA-seq data analysis. The code is available at https://github.com/RongxingWong/AcceleratingSplicedAlignment .
computer science, interdisciplinary applications,engineering, biomedical,biology,mathematical & computational biology