MOL-AE: Auto-Encoder Based Molecular Representation Learning With 3D Cloze Test Objective

Junwei Yang,Kangjie Zheng,Siyu Long,Zaiqing Nie,Ming Zhang,Xinyu Dai,Wei-Yin Ma,Hao Zhou
DOI: https://doi.org/10.1101/2024.04.13.589331
2024-06-05
Abstract:3D molecular representation learning has gained tremendous interest and achieved promising performance in various downstream tasks. A series of recent approaches follow a prevalent framework: an encoder-only model coupled with a coordinate denoising objective. Identifier, which should keep stable. The twisted optimization of these two roles is unstable. However, through a series of analytical experiments, we prove that the encoder-only model with coordinate denoising objective exhibits inconsistency between pre-training and downstream objectives, as well as issues with disrupted atomic identifiers. To address these two issues, we propose Mol-AE for molecular representation learning, an auto-encoder model using positional encoding as atomic identifiers. We also propose a new training objective named 3D Cloze Test to make the model learn better atom spatial relationships from real molecular substructures. Empirical results demonstrate that Mol-AE achieves a large margin performance gain compared to the current state-of-the-art 3D molecular modeling approach. The source codes of Mol-AE are publicly available at https://github.com/yjwtheonly/MolAE .
Bioinformatics
What problem does this paper attempt to address?