A Scored Non-Deterministic Finite Automata Processor for Sequence Alignment

Ryan Karbowniczak Rasha Karakchi
2024-10-11
Abstract:The rapid growth of symbolic data in areas like internet, biological, and financial data has increased the demand for efficient pattern matching and regular expression processing. Non-deterministic Finite Automata (NFA) are used for these tasks, but general-purpose platforms often face memory bottlenecks due to the concurrent nature of NFAs. To address this, Domain-Specific Architectures (DSAs) like FPGA and ASIC-based automata processors have been developed for improved efficiency. However, many modern applications require identifying the optimal match path, such as in DNA sequence alignment, which demands scoring methods to evaluate the best match. This work enhances the FPGA-based NAPOLY automata processor by integrating scoring capabilities, creating an extended version called NAPOLY+ that assigns weights to transitions, enabling the identification of the highest scoring path. Implementing this approach introduces challenges, including increased state space complexity and resource demands due to multiple active paths. The NAPOLY+ system addresses these by incorporating arithmetic components to calculate scores along paths and using efficient memory management to maintain scalability. Experimental evaluation on the Zynq Ultrascale+ ZCU104 FPGA demonstrated high device utilization and performance variations based on array size and fan-out. While results are preliminary, ongoing testing will include real datasets to assess the end-to-end performance of NAPOLY+ in practical applications such as BLAST.
Distributed, Parallel, and Cluster Computing,Emerging Technologies
What problem does this paper attempt to address?