MinimapR: A Parallel Alignment Tool for the Analysis of Large-Scale Third-Generation Sequencing Data

Zihang Wang,Yingbo Cui,Shaoliang Peng,Xiangke Liao,Yangbo Yu
DOI: https://doi.org/10.1016/j.compbiolchem.2022.107735
IF: 3.737
2022-01-01
Computational Biology and Chemistry
Abstract:The development of third-generation sequencing technology has brought significant changes and influences on genomics. Compared to the second-generation sequencing methods, the third-generation technologies produce around 100 times longer reads to reveal new genomic variations that complete long-term gaps in the human reference genome. However, these reads' excessive length and high error rate severely increase the amount of data and alignment cost. The traditional data analysis platform and serial sequence alignment method can not effectively deal with large-scale long read alignment. There is a critical need for a novel data analysis platform that can deliver fast alignment of large-scale sequences to solve the problem of long read alignment. High-performance computing platforms and efficient, scalable algorithms based on these platforms have significant potential to impact sequence analysis approaches. This paper presented minimapR, a multi-level parallel long-read alignment tool based on minimap2, a popular third-generation read aligner. MinimapR is developed based on the new high-performance distributed framework Ray. Ray fully integrates with the Python environment and can be easily installed with pip. MinimapR can utilize the power of multiple computing nodes, significantly accelerating alignment speeds without sacrificing sensitivity. The minimapR tool was tested on 64 nodes and demonstrated a 50 fold increase in speed with 78 % parallel efficiency. The source code and user manual of minimapR are freely available at https://github.com/Geehome/minimapR .
What problem does this paper attempt to address?