Single-sequence protein-RNA complex structure prediction by geometric attention-enabled pairing of biological language models

Rahmatullah Roche,Sumit Tarafder,Debswapna Bhattacharya
DOI: https://doi.org/10.1101/2024.07.27.605468
2024-07-28
Abstract:Ground-breaking progress has been made in structure prediction of biomolecular assemblies, including the recent breakthrough of AlphaFold 3. However, it remains challenging for AlphaFold 3 and other state-of-the-art deep learning-based methods to accurately predict protein-RNA complex structures, in part due to the limited availability of evolutionary and structural information related to protein-RNA interactions that are used as inputs to the existing approaches. Here, we introduce ProRNA3D-single, a new deep-learning framework for protein-RNA complex structure prediction with only single-sequence input. Using a novel geometric attention-enabled pairing of biological language models of protein and RNA, a previously unexplored avenue, ProRNA3D-single enables the prediction of interatomic protein-RNA interaction maps, which are then transformed into multi-scale geometric restraints for modeling 3D structures of protein-RNA complexes via geometry optimization. Benchmark tests show that ProRNA3D-single convincingly outperforms current state-of-the-art methods including AlphaFold 3, particularly when evolutionary information is limited; and exhibits remarkable robustness and performance resilience by attaining better accuracy with only single-sequence input than what most methods can achieve even with explicit evolutionary information. Freely available at https://github.com/Bhattacharya-Lab/ProRNA3D-single, ProRNA3D-single should be broadly useful for modeling 3D structures of protein-RNA complexes at scale, regardless of the availability of evolutionary information.
Biology
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in protein - RNA complex structure prediction. Although significant progress has been made in biomolecular assembly structure prediction, such as the success of AlphaFold 3, difficulties still exist in accurately predicting the structure of protein - RNA complexes. This is mainly due to the lack of evolutionary and structural information related to protein - RNA interactions, which are usually used as inputs for existing methods. Therefore, the paper introduces a new deep - learning framework - ProRNA3D - single, aiming to predict the structure of protein - RNA complexes using only single - sequence inputs. Specifically, ProRNA3D - single generates a protein - RNA interaction map by combining protein and RNA biolinguistic models and introducing a geometric attention mechanism, and then optimizes multi - scale geometric constraints to achieve three - dimensional structure modeling of protein - RNA complexes. This method performs excellently in benchmark tests, especially when the evolutionary information is limited, and its performance is better than the current state - of - the - art methods, such as AlphaFold 3, RoseTTAFold2NA and RoseTTAFold All - Atom. The main contributions of the paper are as follows: 1. **Single - sequence input**: ProRNA3D - single can predict the structure of protein - RNA complexes using only a single sequence without multi - sequence alignment (MSA) information. 2. **Geometric attention mechanism**: By combining protein and RNA biolinguistic models and introducing a geometric attention mechanism, the prediction accuracy and robustness are improved. 3. **Multi - scale geometric constraints**: The predicted interaction map is converted into multi - scale geometric constraints, which are used to optimize the spatial positions and orientations of protein and RNA components, thereby generating a three - dimensional structure model of the complex. In conclusion, ProRNA3D - single provides a new solution that can efficiently and accurately predict the structure of protein - RNA complexes in the absence of evolutionary information and has broad application prospects.