NEAR: Neural Embeddings for Amino acid Relationships

Daniel R. Olson,Daphne Demekas,Thomas Colligan,Travis J. Wheeler
DOI: https://doi.org/10.1101/2024.01.25.577287
2024-01-30
Abstract:We present NEAR, a method based on representation learning that is designed to rapidly identify good sequence alignment candidates from a large protein database. NEAR’s neural embedding model computes per-residue embeddings for target and query protein sequences, and identifies alignment candidates with a pipeline consisting of k-NN search, filtration, and neighbor aggregation. NEAR’s ResNet embedding model is trained using an N-pairs loss function guided by sequence alignments generated by the widely used tool. Benchmarking results reveal improved performance relative to state-of-the-art neural embedding models specifically developed for protein sequences, as well as enhanced speed relative to the alignment-based filtering strategy used in sensitive alignment pipeline.
Bioinformatics
What problem does this paper attempt to address?