Meta Learning for Low-Resource Molecular Optimization
Jiahao Wang,Shuangjia Zheng,Jianwen Chen,Yuedong Yang
DOI: https://doi.org/10.1021/acs.jcim.0c01416
IF: 6.162
2021-03-17
Journal of Chemical Information and Modeling
Abstract:The goal of molecular optimization (MO) is to discover molecules that acquire improved pharmaceutical properties over a known starting molecule. Despite many recent successes of new approaches for MO, these methods were typically developed for particular properties with rich annotated training examples. Thus, these approaches are difficult to implement in real scenes where only a small amount of pharmaceutical data is usually available due to the expense and significant effort required for the data collection. Here, we propose a new approach, Meta-MO, for molecular optimization with a handful of training samples based on the well-recognized first-order meta-learning algorithms. By using a set of meta tasks with rich training samples, Meta-MO trains a meta model through the meta-learning optimization and adapts the learned model to new low-resource MO tasks. Meta-MO was shown to consistently outperform several pretraining and multitask training procedures, providing an average improvement in the success rate of 4.3% on a large-scale bioactivity data set with diverse target variations. We also observed that Meta-MO resulted in the best performing models across fine-tuning sets with only dozens of samples. To the best of our knowledge, this is the first study to apply meta learning to MO tasks. More importantly, such a strategy could be further extended to many low-resource scenarios in real-world drug design.The Supporting Information is available free of charge at <a class="ext-link" href="/doi/10.1021/acs.jcim.0c01416?goto=supporting-info">https://pubs.acs.org/doi/10.1021/acs.jcim.0c01416</a>.Detailed descriptions of graph encoder and Transformer architecture; Table S1, model input representations for atoms; Table S2, <i>R</i><sup>2</sup>, RMSE, and MAE metrics for query task scoring models; Table S3, standard deviations of data in <a class="ref showTableEvent internalNav" href="#tbl4">Table </a><a class="ref showTableEvent internalNav" href="#tbl4">4</a>; and Figures S1–S4, distributions of source molecule weight, synthetic accessibility score, logP score, and bioactivity (<a class="ext-link" href="/doi/suppl/10.1021/acs.jcim.0c01416/suppl_file/ci0c01416_si_001.pdf">PDF</a>)Data set splits of tasks (<a class="ext-link" href="/doi/suppl/10.1021/acs.jcim.0c01416/suppl_file/ci0c01416_si_002.xlsx">XLSX</a>)This article has not yet been cited by other publications.
chemistry, multidisciplinary, medicinal,computer science, interdisciplinary applications, information systems