Needleman-Wunsch Attention: A Framework for Enhancing DNA Sequence Embedding

Kyelim Lee,Albert No
DOI: https://doi.org/10.1109/access.2024.3401464
IF: 3.9
2024-05-22
IEEE Access
Abstract:In many biological research studies that rely on DNA sequence data, calculating the edit distance between two sequences is a vital component. However, computing the edit distance involves dynamic programming, which can be computationally intensive. To address this challenge, numerous works have focused on embedding sequences into the vector space while preserving the distance metric. This means that the edit distance between sequences is analogous to the distance between their corresponding vectors. In this study, we propose a novel Needleman-Wunsch Attention (NWA) framework for sequence embedding that leverages the relationship between the Needleman-Wunsch (NW) matrix and attention maps to improve the accuracy and efficiency of edit distance approximation methods. Our approach applies to any deep learning-based sequence embedding network and provides a general solution to improve the accuracy and efficiency of edit distance approximation methods. We validate the effectiveness of our proposed method by applying it to various existing embedding networks, demonstrating improved edit distance-preserving embedding in an actual dataset. The code is publicly available at https://github.com/thisislim/nw-attention/.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?