RNA-ligand interaction scoring via data perturbation and augmentation modeling

Hongli Ma,Letian Gao,Yunfan Jin,Yilan Bai,Xiaofan Liu,Pengfei Bao,Ke Liu,Zhenjiang Zech Xu,Zhi John Lu
DOI: https://doi.org/10.1101/2024.06.26.600802
2024-01-01
Abstract:RNA-targeting drug discovery is undergoing an unprecedented revolution. Despite recent advances in this field, developing data-driven deep learning models remains challenging due to the limited availability of validated RNA-small molecule interactions and the scarcity of known RNA structures. In this context, we introduce RNAsmol, a novel sequence-based deep learning framework that incorporates data perturbation with augmentation, graph-based molecular feature representation and attention-based feature fusion modules to predict RNA-small molecule interactions. RNAsmol employs perturbation strategies to balance the bias between true negative and unknown interaction space thereby elucidating the intrinsic binding patterns between RNA and small molecules. The resulting model demonstrates accurate predictions of the binding between RNA and small molecules, outperforming other methods with average improvements of ∼8% (AUROC) in 10-fold cross-validation, ∼16% (AUROC) in cold evaluation (on unseen datasets), and ∼30% (ranking score) in decoy evaluation. Moreover, we use case studies to validate molecular binding hotspots in the prediction of RNAsmol, proving the model’s interpretability. In particular, we demonstrate that RNAsmol, without requiring structural input, can generate reliable predictions and be adapted to many RNA-targeting drug design scenarios. ### Competing Interest Statement The authors have declared no competing interest.
What problem does this paper attempt to address?