De Novo Drug Design by Multi-Objective Path Consistency Learning with Beam A∗ Search

Dengwei Zhao,Jingyuan Zhou,Shikui Tu,Lei Xu
DOI: https://doi.org/10.1109/TCBB.2024.3477592
2024-10-09
Abstract:Generating high-quality and drug-like molecules from scratch within the expansive chemical space presents a significant challenge in the field of drug discovery. In prior research, value-based reinforcement learning algorithms have been employed to generate molecules with multiple desired properties iteratively. The immediate reward was defined as the evaluation of intermediate-state molecules at each step, and the learning objective would be maximizing the expected cumulative evaluation scores for all molecules along the generative path. However, this definition of the reward was misleading, as in reality, the optimization target should be the evaluation score of only the final generated molecule. Furthermore, in previous works, randomness was introduced into the decision-making process, enabling the generation of diverse molecules but no longer pursuing the maximum future rewards. In this paper, immediate reward is defined as the improvement achieved through the modification of the molecule to maximize the evaluation score of the final generated molecule exclusively. Originating from the A ∗ search, path consistency (PC), i.e., f values on one optimal path should be identical, is employed as the objective function in the update of the f value estimator to train a multi-objective de novo drug designer. By incorporating the f value into the decision-making process of beam search, the DrugBA∗ algorithm is proposed to enable the large-scale generation of molecules that exhibit both high quality and diversity. Experimental results demonstrate a substantial enhancement over the state-of-theart algorithm QADD in multiple molecular properties of the generated molecules.
What problem does this paper attempt to address?