Improving the Thread Scalability and Parallelism of BWA-MEM on Intel HPC Platforms.

Xinyuan Li,Lin Xu,Jian Zhang
DOI: https://doi.org/10.1109/HPCC/SmartCity/DSS.2019.00256
2019-01-01
Abstract:With modern next-generation sequencing (NGS) technology, genome sequencing is able to process much more data at low cost. Burrows-Wheeler Aligner (BWA) is one of the most widely used open-source software tools to align read sequences into a reference genome and sequence alignment takes the most time in NGS data analysis. BWA-MEM is one of the most popular tools in BWA. It is a memory-bound application and in traditional view cache miss is one of the most serious problems. But with the increase in the number of cores and threads on HPC platforms, the problem of low parallelism and thread scalability becomes more serious and deeply affects the performance. Despite extensive optimization efforts, the thread scalability and parallelism of BWA-MEM are still not very efficient as the majority of the optimization works don't take advantages of hardware's architecture characters. After analyzing BWA-MEM's performance on modern Intel HPC platforms, we find that BWA-MEM has limitations in not fully supporting multi-threading mechanism. Based on this analysis, we propose some optimizations on BWA-MEM, which focus on improving the threading scalability and parallelism to take the advantages of many-core architecture computing resources. First we reorganized the pipeline of the application and give it a dynamic thread allocation strategy to make the pipeline more efficient. Furthermore, we give an optimization on sequence processing section by optimizing the memory allocation and the dynamic thread scheduler of the calculation part. With our optimization, BWA-MEM achieves ~2x faster on Xeon and ~3x faster on Xeon Phi processors than the original implementation.
What problem does this paper attempt to address?