Adapting Differential Molecular Representation with Hierarchical Prompts for Multi-label Property Prediction

Linjia Kang,Songhua Zhou,Shuyan Fang,Shichao Liu
2024-08-11
Abstract:Accurate prediction of molecular properties is crucial in drug discovery. Traditional methods often overlook that real-world molecules typically exhibit multiple property labels with complex correlations. To this end, we propose a novel framework, HiPM, which stands for hierarchical prompted molecular representation learning framework. HiPM leverages task-aware prompts to enhance the differential expression of tasks in molecular representations and mitigate negative transfer caused by conflicts in individual task information. Our framework comprises two core components: the Molecular Representation Encoder (MRE) and the Task-Aware Prompter (TAP). MRE employs a hierarchical message-passing network architecture to capture molecular features at both the atom and motif levels. Meanwhile, TAP utilizes agglomerative hierarchical clustering algorithm to construct a prompt tree that reflects task affinity and distinctiveness, enabling the model to consider multi-granular correlation information among tasks, thereby effectively handling the complexity of multi-label property prediction. Extensive experiments demonstrate that HiPM achieves state-of-the-art performance across various multi-label datasets, offering a novel perspective on multi-label molecular representation learning.
Quantitative Methods,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to accurately predict the multi - label properties of molecules in drug discovery. Traditional methods often overlook the fact that molecules in the real world usually exhibit multiple property labels, and there are complex correlations among these labels. This leads to a significant gap in understanding the complete biological activity of molecules, ultimately affecting the efficiency of drug discovery. To address this challenge, the paper proposes a new framework - HiPM (Hierarchical Prompted Molecular representation learning framework) for dealing with the multi - label molecular property prediction problem. ### Main problems: 1. **Complexity of multi - label property prediction**: Molecules in the real world usually have multiple property labels, but existing research often ignores this, resulting in insufficient understanding of the complete biological activity of molecules. 2. **Exponential growth of the output space**: In multi - label learning, as the number of labels increases, the output space grows exponentially. For example, 32 labels can produce \(2^{32}\) combinations. 3. **Gradient conflict**: Multi - label learning is a special type of multi - task learning, and the gradient directions of different labels may conflict, making it difficult for the model to optimize the performance of all labels simultaneously. 4. **Correlation among labels**: The potential correlations among labels are very complex, which may be pairwise, involve three labels, or even be common to all labels. ### Solutions: - **HiPM framework**: The HiPM framework enhances the task - difference expression in molecular representation through Task - Aware Prompts and alleviates the negative transfer caused by individual task information conflicts. - **Molecular Representation Encoder (MRE)**: Use a hierarchical message - passing network architecture to capture molecular features at the atomic and motif levels. - **Task - Aware Prompter (TAP)**: Use the agglomerative hierarchical clustering algorithm to construct a prompt tree, reflecting task affinity and uniqueness, enabling the model to consider multi - granularity correlation information among tasks. ### Experimental results: - **Performance comparison**: Experiments on six multi - label datasets show that HiPM achieves state - of - the - art performance on all datasets and obtains the best results in five of them. - **Interpretability**: HiPM not only performs well in terms of performance but also has good interpretability and can effectively capture the correlations among tasks. Through these methods, HiPM provides new perspectives and solutions for multi - label molecular property prediction.