A High-Performance Genomic Accelerator for Accurate Sequence-to-Graph Alignment Using Dynamic Programming Algorithm

Gang Zeng,Jianfeng Zhu,Yichi Zhang,Ganhui Chen,Zhenhai Yuan,Shaojun Wei,Leibo Liu
DOI: https://doi.org/10.1109/tpds.2023.3325137
IF: 5.3
2024-01-01
IEEE Transactions on Parallel and Distributed Systems
Abstract:The rapid mutation of viruses, such as SARS-CoV-2, highlights the urgent need for fast and precise genomic sequencing. The traditional sequencing technique maps the DNA fragments collected from an individual to a known linear reference genome sequence. The linear reference cannot express the genetic diversity of the population, which leads to mapping bias. Therefore, researchers proposed to use a graph reference together with long reads for sequence mapping so that the mapping bias can be avoided to the greatest extent. However, the graph reference introduces irregular edges making memory access of alignment a bottleneck and meanwhile the long read quadratically increases the storage pressure in the alignment process. Therefore, there is a pressing need for a high-performance hardware accelerator for accurate sequence-to-graph alignment. To our best knowledge, this paper presents ASGDP, the first hardware accelerator designed for aligning sequences of arbitrary length reads to a graph. It is based on the traditional dynamic programming algorithm and supports flexible penalty scoring strategies. ASGDP has proposed an efficient memory access pattern in hardware and a hierarchical prediction pruning strategy in algorithm. This combined software-hardware strategy effectively alleviates the storage bottleneck of multi-edge access and improves the accuracy of pruning strategies. We demonstrate that ASGDP provides significant improvements for long reads of the sequence-to-graph alignment. For a typical 10 K long read, a single ASGDP accelerator outperforms state-of-the-art S2G mapping tools by 70.8x, 168.1x.
What problem does this paper attempt to address?