Activity Cliff-Informed Contrastive Learning for Molecular Property Prediction

WANXIANG SHEN,Chao Cui,Xiang Cheng Shi,Yan Bing Zhang,Jie Wu,Yu Zong Chen,Xiaorui Su,Zaixi Zhang,Alejandro Velez-Arce,Jianming Wang,Marinka Zitnik
DOI: https://doi.org/10.26434/chemrxiv-2023-5cz7s-v2
2024-11-07
Abstract:Modeling molecular activity and quantitative structure-activity relationships of chemical compounds is critical in drug design. Graph neural networks, which utilize molecular structures as frames, have shown success in assessing the biological activity of chemical compounds, guiding the selection and optimization of candidates for further development. However, current models often overlook activity cliffs (ACs)—cases where structurally similar molecules exhibit different bioactivities—due to latent spaces primarily optimized for structural features. Here, we introduce AC-awareness (ACA), an inductive bias designed to enhance molecular representation learning for activity modeling. The ACA jointly optimizes metric learning in the latent space and task performance in the target space, making models more sensitive to ACs. We develop \name, an AC-informed contrastive learning approach that can be integrated with any graph neural network. Experiments on 39 benchmark datasets demonstrate that AC-informed representations of chemical compounds consistently outperform standard models in bioactivity prediction across both regression and classification tasks. AC-informed models show strong performance in predicting pharmacokinetic and safety-relevant molecular properties. ACA paves the way toward activity-informed molecular representations, providing a valuable tool for the early stages of lead compound identification, refinement, and virtual screening.
Chemistry
What problem does this paper attempt to address?