Transformers for molecular property prediction: Lessons learned from the past five years

Afnan Sultan,Jochen Sieg,Miriam Mathea,Andrea Volkamer
2024-04-05
Abstract:Molecular Property Prediction (MPP) is vital for drug discovery, crop protection, and environmental science. Over the last decades, diverse computational techniques have been developed, from using simple physical and chemical properties and molecular fingerprints in statistical models and classical machine learning to advanced deep learning approaches. In this review, we aim to distill insights from current research on employing transformer models for MPP. We analyze the currently available models and explore key questions that arise when training and fine-tuning a transformer model for MPP. These questions encompass the choice and scale of the pre-training data, optimal architecture selections, and promising pre-training objectives. Our analysis highlights areas not yet covered in current research, inviting further exploration to enhance the field's understanding. Additionally, we address the challenges in comparing different models, emphasizing the need for standardized data splitting and robust statistical analysis.
Machine Learning,Computation and Language,Quantitative Methods
What problem does this paper attempt to address?
The paper reviews the research on using Transformer models for molecular property prediction in the past five years and discusses key issues in training and fine-tuning Transformer models. The aim of the paper is to distill insights from current research, analyze available models, explore the choice of pretraining data, ideal architecture selection, and promising pretraining objectives. The authors highlight areas not covered in current research and encourage further exploration to enhance understanding of the field. Additionally, they point out the challenges in comparing different models, emphasizing the need for standardized data partitioning and robust statistical analysis. The Transformer models mentioned in the paper have important applications in molecular property prediction in fields such as drug discovery, crop protection, and environmental science. Traditionally, these prediction methods rely on simple physical and chemical properties as well as molecular fingerprints, but in recent years, deep learning methods, particularly Transformer models, are changing this landscape. Initially used for natural language processing, Transformer models learn contextually relevant relationships through self-attention mechanisms, which may be suitable for capturing the complex non-additivity in molecular data. However, the performance of Transformer models is limited by the typically small pretraining datasets. The paper analyzes different types of pretraining datasets and downstream datasets, as well as decisions to consider when implementing Transformer models, such as database selection, molecular representation, molecular tokenization, position embedding, parameter count, pretraining objectives, and fine-tuning strategies. The authors also discuss the balance between fixed features (such as molecular fingerprints) and deep learning models in practical applications, as well as self-supervised learning as a potential solution to overcome the bottleneck of small datasets. In summary, the paper attempts to tackle how to optimize Transformer models for improved accuracy in molecular property prediction, especially in the case of limited data, and enhance the models' generalization ability through better pretraining strategies, model design, and data processing methods.