Rapid GPU-Based Pangenome Graph Layout
Jiajie Li,Jan-Niklas Schmelzle,Yixiao Du,Simon Heumos,Andrea Guarracino,Giulia Guidi,Pjotr Prins,Erik Garrison,Zhiru Zhang
2024-09-02
Abstract:Computational Pangenomics is an emerging field that studies genetic variation using a graph structure encompassing multiple genomes. Visualizing pangenome graphs is vital for understanding genome diversity. Yet, handling large graphs can be challenging due to the high computational demands of the graph layout process.
In this work, we conduct a thorough performance characterization of a state-of-the-art pangenome graph layout algorithm, revealing significant data-level parallelism, which makes GPUs a promising option for compute acceleration. However, irregular data access and the algorithm's memory-bound nature present significant hurdles. To overcome these challenges, we develop a solution implementing three key optimizations: a cache-friendly data layout, coalesced random states, and warp merging. Additionally, we propose a quantitative metric for scalable evaluation of pangenome layout quality.
Evaluated on 24 human whole-chromosome pangenomes, our GPU-based solution achieves a 57.3x speedup over the state-of-the-art multithreaded CPU baseline without layout quality loss, reducing execution time from hours to minutes.
Distributed, Parallel, and Cluster Computing,Computational Engineering, Finance, and Science,Data Structures and Algorithms