Abstract:Molecular activity prediction is critical in drug design. Machine learning techniques such as kernel methods and random forests have been successful for this task. These models require fixed-size feature vectors as input while the molecules are variable in size and structure. As a result, fixed-size fingerprint representation is poor in handling substructures for large molecules. In addition, molecular activity tests, or a so-called BioAssays, are relatively small in the number of tested molecules due to its complexity. Here we approach the problem through deep neural networks as they are flexible in modeling structured data such as grids, sequences and graphs. We train multiple BioAssays using a multi-task learning framework, which combines information from multiple sources to improve the performance of prediction, especially on small datasets. We propose Graph Memory Network (GraphMem), a memory-augmented neural network to model the graph structure in molecules. GraphMem consists of a recurrent controller coupled with an external memory whose cells dynamically interact and change through a multi-hop reasoning process. Applied to the molecules, the dynamic interactions enable an iterative refinement of the representation of molecular graphs with multiple bond types. GraphMem is capable of jointly training on multiple datasets by using a specific-task query fed to the controller as an input. We demonstrate the effectiveness of the proposed model for separately and jointly training on more than 100K measurements, spanning across 9 BioAssay activity tests.

What problem does this paper attempt to address?

This paper attempts to solve several key problems in molecular activity prediction: 1. **Molecular representation problem**: Traditional machine - learning methods such as kernel methods, support vector machines (SVM), and random forests require fixed - size feature vectors as input. However, the size and structure of molecules vary, and using a fixed - size fingerprint representation method will lead to the loss of sub - structure information of large molecules, thus affecting the prediction effect. 2. **Multi - task learning problem**: In drug design, the cost of bioactivity testing is high, resulting in the available data sets usually being small. This may cause over - fitting in deep neural networks during training. Through the multi - task learning framework, information from multiple data sources can be combined to improve prediction performance, especially on small data sets. 3. **Graph - structure modeling problem**: Molecular structures are essentially graph - structures, containing atomic nodes and chemical - bond edges. Existing deep - learning methods have limitations when dealing with such structured data. Therefore, a method that can effectively model graph - structures is required to improve the accuracy of molecular activity prediction. To solve these problems, the authors propose the **Graph Memory Network (GraphMem)**. GraphMem is an enhanced memory network that can flexibly handle graph - structure data and dynamically update the contents of memory units through a multi - hop reasoning process. Specifically, GraphMem has the following characteristics: - **Dynamic memory units**: Each memory unit corresponds to a node in a molecular graph, and these memory units can interact dynamically through a multi - hop reasoning process, gradually refining the representation of the molecular graph. - **Multi - task learning**: GraphMem can input controllers through specific - task queries to achieve multi - task joint training, thereby improving the prediction performance on small data sets. - **Flexibility**: GraphMem can be used not only for molecular activity prediction but also extended to graph - data processing tasks in other fields, such as text and visual question answering. Through experiments on 9 NCI BioAssay activity test data sets, the authors verified the effectiveness of GraphMem, especially in the multi - task learning setting, where GraphMem significantly improved the prediction performance.

Graph Memory Networks for Molecular Activity Prediction

An Adaptive Graph Learning Method for Automated Molecular Interactions and Properties Predictions

A New Fingerprint and Graph Hybrid Neural Network for Predicting Molecular Properties

Enhancing Model Learning and Interpretation Using Multiple Molecular Graph Representations for Compound Property and Activity Prediction

ResGAT: Residual Graph Attention Networks for molecular property prediction

Multitask Learning On Graph Neural Networks Applied To Molecular Property Predictions

Molecular activity prediction using graph convolutional deep neural network considering distance on a molecular graph

Investigating Graph Neural Networks and Classical Feature-Extraction Techniques in Activity-Cliff and Molecular Property Prediction

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures

Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network

Enhancing property and activity prediction and interpretation using multiple molecular graph representations with MMGX

Proximity Graph Networks: Predicting Ligand Affinity with Message Passing Neural Networks

Memory Kernel Minimization Based Neural Networks for Discovering Slow Collective Variables of Biomolecular Dynamics

ASGN: An Active Semi-supervised Graph Neural Network for Molecular Property Prediction

GraphVAMPNet, using graph neural networks and variational approach to markov processes for dynamical modeling of biomolecules

Analyzing Learned Molecular Representations for Property Prediction

AEGNN-M:A 3D Graph-Spatial Co-Representation Model for Molecular Property Prediction

Molecular substructure graph attention network for molecular property identification in drug discovery

Benchmarking Accuracy and Generalizability of Four Graph Neural Networks Using Large In Vitro ADME Datasets from Different Chemical Spaces

HiGNN: Hierarchical Informative Graph Neural Networks for Molecular Property Prediction Equipped with Feature-Wise Attention

A Knowledge-Driven Self-Supervised Approach for Molecular Generation