Protein-Nucleic Acid Complex Modeling with Frame Averaging Transformer

Tinglin Huang,Zhenqiao Song,Rex Ying,Wengong Jin
2024-11-04
Abstract:Nucleic acid-based drugs like aptamers have recently demonstrated great therapeutic potential. However, experimental platforms for aptamer screening are costly, and the scarcity of labeled data presents a challenge for supervised methods to learn protein-aptamer binding. To this end, we develop an unsupervised learning approach based on the predicted pairwise contact map between a protein and a nucleic acid and demonstrate its effectiveness in protein-aptamer binding prediction. Our model is based on FAFormer, a novel equivariant transformer architecture that seamlessly integrates frame averaging (FA) within each transformer block. This integration allows our model to infuse geometric information into node features while preserving the spatial semantics of coordinates, leading to greater expressive power than standard FA models. Our results show that FAFormer outperforms existing equivariant models in contact map prediction across three protein complex datasets, with over 10% relative improvement. Moreover, we curate five real-world protein-aptamer interaction datasets and show that the contact map predicted by FAFormer serves as a strong binding indicator for aptamer screening.
Biomolecules
What problem does this paper attempt to address?
The paper attempts to address the problem of how to efficiently predict the contact map between proteins and nucleic acids in protein-nucleic acid interactions, and how to use these predictions for unsupervised aptamer screening. Specifically: 1. **Protein-Nucleic Acid Contact Map Prediction**: The paper proposes a method based on the prediction of residue-nucleotide pair contact maps to predict the binding sites between proteins and nucleic acids. This task is very important in biomedical research because understanding the binding patterns between proteins and nucleic acids helps to discover new drug targets. 2. **Unsupervised Aptamer Screening**: Aptamers are a class of single-stranded nucleic acids that can bind with high affinity and specificity to various molecules, including some targets that are difficult to drug using traditional methods. However, existing high-throughput screening methods are time-consuming and labor-intensive. The paper proposes an unsupervised learning method to evaluate the binding affinity of aptamers to target proteins through the predicted contact maps, thereby accelerating the aptamer screening process. To achieve the above goals, the paper introduces a new equivariant transformer architecture—FAFormer. FAFormer integrates a frame averaging (FA) module into each transformer block to incorporate geometric information into node features while preserving the spatial semantics of the coordinates, thereby enhancing the model's expressive power. Experimental results show that FAFormer performs excellently in the contact map prediction task on multiple protein complex datasets and also achieves significant performance improvements in the unsupervised aptamer screening task.