Abstract:Pretrained deep learning models self-supervised on large datasets of language, image, and graph representations are often fine-tuned on downstream tasks and have demonstrated remarkable adaptability in a variety of applications including chatbots, autonomous driving, and protein folding. Additional research aims to improve performance on downstream tasks by fusing high dimensional data representations across multiple modalities. In this work, we explore a novel fusion of a pretrained language model, ChemBERTa-2, with graph neural networks for the task of molecular property prediction. We benchmark the MolPROP suite of models on seven scaffold split MoleculeNet datasets and compare with state-of-the-art architectures. We find that (1) multimodal property prediction for small molecules can match or significantly outperform modern architectures on hydration free energy (FreeSolv), experimental water solubility (ESOL), lipophilicity (Lipo), and clinical toxicity tasks (ClinTox), (2) the MolPROP multimodal fusion is predominantly beneficial on regression tasks, (3) the ChemBERTa-2 masked language model pretraining task (MLM) outperformed multitask regression pretraining task (MTR) when fused with graph neural networks for multimodal property prediction, and (4) despite improvements from multimodal fusion on regression tasks MolPROP significantly underperforms on some classification tasks. MolPROP has been made available at https://github.com/merck/MolPROP.

What problem does this paper attempt to address?

The paper primarily explores a new multimodal fusion method that combines pre-trained language models with graph neural networks for the prediction of small molecule properties. Specifically, the researchers developed a method called MolPROP, which integrates ChemBERTa-2 (a pre-trained language model) with graph neural networks (including Graph Convolutional Networks GCN and Graph Attention Networks GATv2) to improve the accuracy of small molecule property predictions. The main contributions of the paper can be summarized as follows: 1. **Multimodal Fusion**: The authors explore a novel approach that combines pre-trained language models (ChemBERTa-2) with graph neural networks for supervised tasks—molecular property prediction. This fusion significantly enhances the performance of certain regression prediction tasks and provides opportunities to explore different fusion strategies in multimodal molecular property prediction classification tasks. 2. **Performance Evaluation**: The MolPROP model was benchmarked on seven different MoleculeNet datasets, which were divided into regression tasks (such as hydration free energy, experimental water solubility, hydrophobicity, etc.) and classification tasks (such as inhibition of human β-secretase activity, blood-brain barrier permeability, and clinical toxicity). The results show that the MolPROP model performs excellently on regression tasks, even surpassing modern architectures; however, its performance on classification tasks is more complex, performing well in clinical toxicity prediction but not as expected in other classification tasks. 3. **Key Findings**: - Multimodal property prediction for small molecule regression tasks can match or significantly surpass modern architectures. - The fusion of language and graph models is mainly beneficial for regression tasks. - When performing multimodal fusion with graph neural networks, the masked language model pre-training task (MLM) of ChemBERTa-2 performs better than the multi-task regression pre-training task (MTR). - Despite improvements in regression tasks, MolPROP performs poorly on some classification tasks. In summary, this study demonstrates the potential of combining language models and graph neural networks in small molecule property prediction, particularly achieving good results in regression tasks, while also revealing challenges faced in classification tasks.

MolPROP: Molecular Property prediction with multimodal language and graph fusion

Multimodal Fusion with Relational Learning for Molecular Property Prediction

Multimodal fused deep learning for drug property prediction: Integrating chemical language and molecular graph

Integrating Chemical Language and Molecular Graph in Multimodal Fused Deep Learning for Drug Property Prediction

Improving Molecular Properties Prediction Through Latent Space Fusion

Beyond Chemical Language: A Multimodal Approach to Enhance Molecular Property Prediction

Extracting Molecular Properties from Natural Language with Multimodal Contrastive Learning

Boosting the performance of molecular property prediction via graph–text alignment and multi-granularity representation enhancement

DIG-Mol: A Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction

Chemprop: A Machine Learning Package for Chemical Property Prediction

MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures

Multi-Modal Representation Learning for Molecular Property Prediction: Sequence, Graph, Geometry

Fast and Effective Molecular Property Prediction with Transferability Map

Synergistic Fusion of Graph and Transformer Features for Enhanced Molecular Property Prediction

Explainable Molecular Property Prediction: Aligning Chemical Concepts with Predictions via Language Models

MolMVC: Enhancing molecular representations for drug-related tasks through multi-view contrastive learning

Analyzing Learned Molecular Representations for Property Prediction

EMPPNet: Enhancing Molecular Property Prediction via Cross-modal Information Flow and Hierarchical Attention

MolCloze - A Unified Cloze-style Self-supervised Molecular Structure Learning Model for Chemical Property Prediction.

Contrastive Dual-Interaction Graph Neural Network for Molecular Property Prediction

Can Large Language Models Empower Molecular Property Prediction?