Deep learning methods for molecular representation and property prediction

Zhen Li,Mingjian Jiang,Shuang Wang,Shugang Zhang
DOI: https://doi.org/10.1016/j.drudis.2022.103373
IF: 8.369
2022-09-29
Drug Discovery Today
Abstract:With advances in artificial intelligence (AI) methods, computer-aided drug design (CADD) has developed rapidly in recent years. Effective molecular representation and accurate property prediction are crucial tasks in CADD workflows. In this review, we summarize contemporary applications of deep learning (DL) methods for molecular representation and property prediction. We categorize DL methods according to the format of molecular data (1D, 2D, and 3D). In addition, we discuss some common DL...
pharmacology & pharmacy
What problem does this paper attempt to address?
The paper primarily explores the application of deep learning (DL) methods in molecular representation and property prediction, and attempts to address the following issues: 1. **Effective Molecular Representation Methods**: The paper emphasizes how to effectively represent molecular structures for deep learning models to process. Traditional molecular formula representations lack structural information, making it difficult to predict molecular properties. Therefore, researchers have proposed various representation methods, such as SMILES strings, fingerprints, and graph representations, to capture the spatial relationships and other important features of molecules. 2. **Improving Prediction Accuracy**: By introducing different deep learning models and techniques (such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and their variants), the aim is to enhance the accuracy of molecular property predictions. Additionally, self-supervised learning (SSL) methods are used to mine intrinsic features from unlabeled datasets, thereby reducing the need for a large number of labeled samples. 3. **Enhancing Model Interpretability**: Addressing the interpretability of complex deep learning models, the paper discusses how to better understand the internal working mechanisms of models by designing specific tasks or adopting new architectures. For example, incorporating attention mechanisms can highlight important substructures within molecules. 4. **Multimodal Data Integration**: Besides single-form data representation, the paper also introduces how to integrate different types of molecular data (such as 1D sequences, 2D graphs, 3D structures, etc.) to comprehensively characterize molecular properties and improve prediction performance. In summary, the paper aims to improve the efficiency and accuracy of molecular representation and property prediction in computer-aided drug design (CADD) by comprehensively utilizing the latest deep learning technologies and methods, thereby providing more support for new drug discovery.