Target-aware Molecule Generation for Drug Design Using a Chemical Language Model

Yingce Xia,Kehan Wu,Pan Deng,Renhe Liu,Yuan Zhang,Han Guo,Yumeng Cui,Qizhi Pei,Lijun Wu,Shufang Xie,Si Chen,Xi Lu,Song Hu,Jinzhi Wu,Chi-Kin Chan,Shuo Chen,Liangliang Zhou,Nenghai Yu,Haiguang Liu,Jinjiang Guo,Tao Qin,Tie-Yan Liu
DOI: https://doi.org/10.1101/2024.01.08.574635
2024-02-01
Abstract:Generative drug design facilitates the creation of compounds effective against pathogenic target proteins. This opens up the potential to discover novel compounds within the vast chemical space and fosters the development of innovative therapeutic strategies. However, the practicality of generated molecules is often limited, as many designs focus on a narrow set of drug-related properties, failing to improve the success rate of subsequent drug discovery process. To overcome these challenges, we develop TamGen, a method that employs a GPT-like chemical language model and enables target-aware molecule generation and compound refinement. We demonstrate that the compounds generated by TamGen have improved molecular quality and viability. Additionally, we have integrated TamGen into a drug discovery pipeline and identified 7 compounds showing compelling inhibitory activity against the Tuberculosis ClpP protease, with the most effective compound exhibiting a half maximal inhibitory concentration (IC ) of μM. Our findings underscore the practical potential and real-world applicability of generative drug design approaches, paving the way for future advancements in the field.
Biochemistry
What problem does this paper attempt to address?
The paper proposes a method called TamGen for target-guided drug molecule generation to improve the drug design process. In traditional drug discovery methods, although there are various deep learning-based screening techniques, the generated molecules often only focus on limited drug-related properties, which limits the success rate of subsequent drug discovery. TamGen draws on the chemical language model similar to GPT, which enables target-aware molecule generation and compound optimization. The main contributions of the paper include: 1. TamGen combines a GPT-style chemical language model that can generate drug compounds with better molecular quality and feasibility. 2. Researchers integrate TamGen into the drug discovery process and, through experiments targeting the ClpP protease of tuberculosis (TB), discover 7 compounds that show strong inhibitory activity, with the most effective compound having a half-maximal inhibitory concentration (IC50) of 1.9μM. 3. The compounds generated by TamGen not only increase the diversity of the candidate pool but also provide effective anchors for hit expansion and structure-activity relationship (SAR) synthesis, demonstrating the potential of generative drug design methods in practical applications. The experimental results of the paper show that compared to other current methods, TamGen achieves a better balance between the rationality of generated compounds, pharmacological activity, and synthetic accessibility, and demonstrates high efficiency in practical drug discovery. Through a case study targeting the ClpP protease of tuberculosis, TamGen successfully designs new molecules with inhibitory activity, providing new possibilities for addressing drug resistance issues. These findings emphasize the broad application prospects of generative drug design methods in the field of drug development.