Hyperbolic relational graph convolution networks plus: a simple but highly efficient QSAR-modeling method

Zhenxing Wu,Dejun Jiang,Chang-Yu Hsieh,Guangyong Chen,Ben Liao,Dongsheng Cao,Tingjun Hou
DOI: https://doi.org/10.1093/bib/bbab112
IF: 9.5
2021-04-19
Briefings in Bioinformatics
Abstract:Abstract Accurate predictions of druggability and bioactivities of compounds are desirable to reduce the high cost and time of drug discovery. After more than five decades of continuing developments, quantitative structure–activity relationship (QSAR) methods have been established as indispensable tools that facilitate fast, reliable and affordable assessments of physicochemical and biological properties of compounds in drug-discovery programs. Currently, there are mainly two types of QSAR methods, descriptor-based methods and graph-based methods. The former is developed based on predefined molecular descriptors, whereas the latter is developed based on simple atomic and bond information. In this study, we presented a simple but highly efficient modeling method by combining molecular graphs and molecular descriptors as the input of a modified graph neural network, called hyperbolic relational graph convolution network plus (HRGCN+). The evaluation results show that HRGCN+ achieves state-of-the-art performance on 11 drug-discovery-related datasets. We also explored the impact of the addition of traditional molecular descriptors on the predictions of graph-based methods, and found that the addition of molecular descriptors can indeed boost the predictive power of graph-based methods. The results also highlight the strong anti-noise capability of our method. In addition, our method provides a way to interpret models at both the atom and descriptor levels, which can help medicinal chemists extract hidden information from complex datasets. We also offer an HRGCN+'s online prediction service at https://quantum.tencent.com/hrgcn/.
biochemical research methods,mathematical & computational biology
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve several key challenges in Quantitative Structure - Activity Relationship (QSAR) modeling during the drug discovery process. Specifically, the authors propose an improved graph neural network method that combines molecular graphs and molecular descriptors as inputs, named Hyperbolic Relational Graph Convolution Network Plus (HRGCN+). The following are the main problems that this paper attempts to solve: 1. **Improve the prediction accuracy of QSAR models**: - Traditional QSAR methods face two main limitations in the big data era: First, molecular descriptors are predefined and cannot be updated automatically; second, it is difficult to select the descriptors that are most relevant to specific properties. Moreover, as the size of the dataset increases, descriptor - based methods may be affected by over - fitting or feature redundancy. - Although Graph Neural Networks (GNNs) can directly learn representations from molecular graphs, they may perform poorly on small datasets due to insufficient data. 2. **Combine the advantages of graph representation and descriptors**: - HRGCN+ combines molecular graphs and molecular descriptors to take advantage of both. Molecular graphs provide an intuitive interpretation at the atomic and bond levels, while molecular descriptors provide additional chemical information, which helps to improve the generalization ability of the model, especially on small datasets. 3. **Enhance the noise resistance and interpretability of the model**: - In drug discovery datasets, noise is inevitable. HRGCN+ shows stronger noise resistance and can handle noisy data while maintaining high prediction performance. - In addition, HRGCN+ provides a method to interpret the model at the atomic and descriptor levels, helping drug chemists extract valuable information from complex data. 4. **Implement an efficient and easy - to - use QSAR modeling tool**: - The paper not only shows the excellent performance of HRGCN+ on multiple drug discovery - related datasets but also provides an online prediction service, making it convenient for researchers to use this method for QSAR modeling. In summary, by proposing HRGCN+, this paper aims to solve the limitations of existing QSAR methods in the big data era, improve prediction accuracy, noise resistance, and interpretability, and provide an efficient and easy - to - use tool for drug discovery.