Refining the pool of RNA-binding domains advances the classification and prediction of RNA-binding proteins

Elsa Wassmer,Gergely Koppány,Malte Hermes,Sven Diederichs,Maïwen Caudron-Herger
DOI: https://doi.org/10.1093/nar/gkae536
IF: 14.9
2024-06-26
Nucleic Acids Research
Abstract:From transcription to decay, RNA-binding proteins (RBPs) influence RNA metabolism. Using the RBP2GO database that combines proteome-wide RBP screens from 13 species, we investigated the RNA-binding features of 176 896 proteins. By compiling published lists of RNA-binding domains (RBDs) and RNA-related protein family (Rfam) IDs with lists from the InterPro database, we analyzed the distribution of the RBDs and Rfam IDs in RBPs and non-RBPs to select RBDs and Rfam IDs that were enriched in RBPs. We also explored proteins for their content in intrinsically disordered regions (IDRs) and low complexity regions (LCRs). We found a strong positive correlation between IDRs and RBDs and a co-occurrence of specific LCRs. Our bioinformatic analysis indicated that RBDs/Rfam IDs were strong indicators of the RNA-binding potential of proteins and helped predicting new RBP candidates, especially in less investigated species. By further analyzing RBPs without RBD, we predicted new RBDs that were validated by RNA-bound peptides. Finally, we created the RBP2GO composite score by combining the RBP2GO score with new quality factors linked to RBDs and Rfam IDs. Based on the RBP2GO composite score, we compiled a list of 2018 high-confidence human RBPs. The knowledge collected here was integrated into the RBP2GO database at https://RBP2GO-2-Beta.dkfz.de.
biochemistry & molecular biology
What problem does this paper attempt to address?
The paper aims to address the following issues: 1. **Classification and Prediction of RNA-Binding Proteins (RBP)**: With the development of numerous proteomics methods, a large number of RBP datasets have been generated. However, there is a significant lack of overlap among these datasets, with many proteins being detected in only one dataset. Therefore, a method is needed to quantify the relevance of RBP candidates. 2. **Selection and Identification of RNA-Binding Domains (RBD)**: By collecting RBDs and their related families (Rfam) listed in published studies, the paper analyzes the distribution of these domains in RBPs and non-RBPs, thereby screening out RBDs and Rfam IDs enriched in RBPs. Additionally, the paper explores the relationship between intrinsically disordered regions (IDR) and low complexity regions (LCR) with RBDs, finding a strong positive correlation between them. 3. **Prediction and Validation of New RBDs**: For RBP candidates lacking known RBDs, the paper predicts new RBDs and validates them through studies of RNA-binding peptides. Ultimately, the research team identified 15 new RBDs. 4. **Construction of the RBP2GO Composite Score**: By combining the RBP2GO score with other quality factors (such as the presence of RBDs and Rfam IDs), the paper creates a new RBP2GO composite scoring system. Based on this scoring system, the paper compiles a list of 2018 high-confidence human RBPs. In summary, the paper focuses on improving the classification and prediction of RBPs through bioinformatics methods, particularly in identifying new RBP candidates in species with limited data.