Abstract:Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pre-training data, optimal architecture selections, and promising pre-training objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.

What problem does this paper attempt to address?

The paper reviews the research on using Transformer models for molecular property prediction in the past five years and discusses key issues in training and fine-tuning Transformer models. The aim of the paper is to distill insights from current research, analyze available models, explore the choice of pretraining data, ideal architecture selection, and promising pretraining objectives. The authors highlight areas not covered in current research and encourage further exploration to enhance understanding of the field. Additionally, they point out the challenges in comparing different models, emphasizing the need for standardized data partitioning and robust statistical analysis. The Transformer models mentioned in the paper have important applications in molecular property prediction in fields such as drug discovery, crop protection, and environmental science. Traditionally, these prediction methods rely on simple physical and chemical properties as well as molecular fingerprints, but in recent years, deep learning methods, particularly Transformer models, are changing this landscape. Initially used for natural language processing, Transformer models learn contextually relevant relationships through self-attention mechanisms, which may be suitable for capturing the complex non-additivity in molecular data. However, the performance of Transformer models is limited by the typically small pretraining datasets. The paper analyzes different types of pretraining datasets and downstream datasets, as well as decisions to consider when implementing Transformer models, such as database selection, molecular representation, molecular tokenization, position embedding, parameter count, pretraining objectives, and fine-tuning strategies. The authors also discuss the balance between fixed features (such as molecular fingerprints) and deep learning models in practical applications, as well as self-supervised learning as a potential solution to overcome the bottleneck of small datasets. In summary, the paper attempts to tackle how to optimize Transformer models for improved accuracy in molecular property prediction, especially in the case of limited data, and enhance the models' generalization ability through better pretraining strategies, model design, and data processing methods.

Transformers for molecular property prediction: Lessons learned from the past five years

KnoMol: A Knowledge-Enhanced Graph Transformer for Molecular Property Prediction

Understanding the Limitations of Deep Models for Molecular Property Prediction: Insights and Solutions.

Pre-training Transformers for Molecular Property Prediction Using Reaction Prediction

Transferring a molecular foundation model for polymer property predictions

Advanced deep learning methods for molecular property prediction

Transformer-based molecular optimization beyond matched molecular pairs

Molecular Descriptors Property Prediction Using Transformer-Based Approach

Fast and Effective Molecular Property Prediction with Transferability Map

Algebraic graph-assisted bidirectional transformers for molecular property prediction

Few-shot learning with transformers via graph embeddings for molecular property prediction

Synergistic Fusion of Graph and Transformer Features for Enhanced Molecular Property Prediction

Advancements in Molecular Property Prediction: A Survey of Single and Multimodal Approaches

3D-Transformer: Molecular Representation with Transformer in 3D Space

INTransformer: Data augmentation-based contrastive learning by injecting noise into transformer for molecular property prediction

KPGT: Knowledge-Guided Pre-training of Graph Transformer for Molecular Property Prediction

Transformer-based deep learning for predicting protein properties in the life sciences

A review of transformers in drug discovery and beyond

ABT-MPNN: an atom-bond transformer-based message-passing neural network for molecular property prediction

Dynamic Molecular Graph-based Implementation for Biophysical Properties Prediction

Impact of Domain Knowledge and Multi-Modality on Intelligent Molecular Property Prediction: A Systematic Survey