ProtHyena: A fast and efficient foundation protein language model at single amino acid Resolution

Yiming Zhang,Manabu Okumura
DOI: https://doi.org/10.1101/2024.01.18.576206
2024-01-22
Abstract:The emergence of self-supervised deep language models has revolutionized natural language processing tasks and has recently extended its applications to biological sequence analysis. Traditional models, primarily based on the Transformer and BERT architectures, demonstrate substantial effectiveness in various applications. However, these models are inherently constrained by the attention mechanism’s quadratic computational complexity ( ), limiting their efficiency and the length of context they can process. Addressing these limitations, we introduce , a novel approach that leverages the Hyena operator. This innovative methodology circumvents the constraints imposed by attention mechanisms, thereby reducing the time complexity to a subquadratic, enabling the modeling of extra-long protein sequences at the single amino acid level without the need to compress data. ProtHyena is able to achieve, and in many cases exceed, state-of-the-art results in various downstream tasks with only 10% of the parameters typically required by attention-based models. The architecture of ProtHyena presents a highly efficient solution for training protein predictors, offering a promising avenue for fast and efficient analysis of biological sequences.
Bioinformatics
What problem does this paper attempt to address?