Motif Masking-based Self-Supervised Learning for Molecule Graph Representation Learning*

Yasu Wu,Changlong Fu,Manwen Yang,Haoran Duan,Cheng Xie
DOI: https://doi.org/10.1109/icebe59045.2023.00040
2023-01-01
Abstract:Molecule graph representation is an emergent technique for chemistry analysis. Integrating it with online drug sales can deeply understand the properties of drugs and realize personalized recommendations based on molecular information, which is of great significance in the intelligent analysis system and the improvement of user experience. The latest research works apply the mask-based model to represent the molecule graph and achieve great success in molecule graph representation. However, Existing mask-based molecular graph representation methods can only randomly mask single nodes, and cannot mask key functional groups in the molecular graph as a whole. This causes the mutual information of the key features of the molecule to be lost, limiting the further improvement of the performance of the molecular graph representation. To cope with this challenge, we propose a novel molecular graph representation method that uses motif vocabulary to mask critical functional groups to complement the defining mutual information of the molecular graph. First, the functional groups are discovered by motif vocabulary. Then, the discovered functional groups are masked randomly from the original graph. After, our method effectively learns a molecular graph encoder, which is enhanced by more discriminative node-node and graph-graph cross information. At last, massive experiments show that the graph representation information output by our method can be better used for downstream molecular graph classification tasks. The implementation is publicly available at https://gitee.com/wu-yasu/mmssl.git
What problem does this paper attempt to address?