Accelerating whole-genome alignment in the age of complete genome assemblies

Ghanshyam Chandra,Md. Vasimuddin,Sanchit Misra,Chirag Jain
DOI: https://doi.org/10.1101/2024.11.25.625328
2024-11-28
Abstract:Recent advancements in long-read sequencing and assembly methods have ushered in an era of high-quality genome assemblies. Modern assemblies commonly feature megabase-long sequences frequently spanning entire chromosomes. The increase in the assembly contiguity and the reduced number of assembly contigs also implies that whole-genome alignment is no longer an embarrassingly parallel problem. The conventional method of aligning sequences of the query genome in parallel is to utilize a single thread per sequence. This results in poor CPU utilization and long runtimes. In this work, we designed optimizations to accelerate whole-genome alignment on multi-core processors and implemented them in a commonly used aligner, minimap2. Our improvements include a fine-grained parallel chaining method and a fast mechanism for differentiating primary and secondary chains. Our approach accelerates alignment of human, plant, and primate genomes by 1.6x to 7.2x without compromising accuracy.
Bioinformatics
What problem does this paper attempt to address?