HydRA: Deep-learning models for predicting RNA-binding capacity from protein interaction association context and protein sequence

Wenhao Jin,Kristopher W. Brannan,Katannya Kapeli,Samuel S. Park,Hui Qing Tan,Maya L. Gosztyla,Mayuresh Mujumdar,Joshua Ahdout,Bryce Henroid,Katherine Rothamel,Joy S. Xiang,Limsoon Wong,Gene W. Yeo,Kristopher W Brannan,Samuel S Park,Maya L Gosztyla,Joy S Xiang,Gene W Yeo
DOI: https://doi.org/10.1101/2022.12.23.521837
2022-12-24
bioRxiv
Abstract:RNA-binding proteins (RBPs) control RNA metabolism to orchestrate gene expression, and dysfunctional RBPs underlie many human diseases. Proteome-wide discovery efforts predict thousands of novel RBPs, many of which lack canonical RNA-binding domains. Here, we present a hybrid ensemble RBP classifier (HydRA) that leverages information from both intermolecular protein interactions and internal protein sequence patterns to predict RNA-binding capacity with unparalleled specificity and sensitivity using support vector machine, convolutional neural networks and transformer-based protein language models. HydRA enables Occlusion Mapping to robustly detect known RNA-binding domains and to predict hundreds of uncharacterized RNA-binding domains. Enhanced CLIP validation for a diverse collection of RBP candidates reveals genome-wide targets and confirms RNA-binding activity for HydRA-predicted domains. The HydRA computational framework accelerates construction of a comprehensive RBP catalogue and expands the set of known RNA-binding protein domains.
What problem does this paper attempt to address?