RNAmigos2: Fast and accurate structure-based RNA virtual screening with semi-supervised graph learning and large-scale docking data

Juan G Carvajal-Patino,Vincent Mallet,David Becerra,L. Fernando Nino V,Carlos Oliver,Jerome Waldispuhl
DOI: https://doi.org/10.1101/2023.11.23.568394
2024-06-25
Abstract:RNAs constitute a vast reservoir of mostly untapped drug targets. Structure-based virtual screening (VS) methods screen large compound libraries for identifying promising candidate molecules by conditioning on binding site information. The classical approach relies on molecular docking simulations. However, this strategy does not scale well with the size the small molecule databases and the number of potential RNA targets. Machine learning emerged as a promising technology to resolve this bottleneck. Efficient data-driven VS methods have already been introduced for proteins, but these techniques have not yet been developed for RNAs due to limited dataset sizes and lack of practical use-case evaluation. We propose a data-driven VS pipeline that deals with the unique challenges of RNA molecules through coarse grained modeling of 3D structures and heterogeneous training regimes using synthetic data augmentation and RNA-centric self supervision. We report strong prediction and generalizability of our framework, ranking active compounds among inactives in the top 1% on average on a structurally distinct drug-like test set. Our model results in a thousand-times speedup over docking techniques while obtaining higher performance. Finally, we deploy our model on a recently published in-vitro small molecule microarray experiment with ~20,000 compounds and report enrichment factors at 1% of 8.8 to 16.8 on four unseen RNA riboswitches. This is the first experimental evidence of success for structure based deep learning methods in RNA virtual screening. Our source code and data, as well as a Google Colab notebook for inference, are available on GitHub.
Bioinformatics
What problem does this paper attempt to address?