Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns

Daniel Vik,David Pii,Chirag Mudaliar,Mads Nørregaard-Madsen,Aleksejs Kontijevskis

DOI: https://doi.org/10.1038/s41598-024-59620-4

IF: 4.6

2024-04-17

Scientific Reports

Abstract:This study explores how machine-learning can be used to predict chromatographic retention times (RT) for the analysis of small molecules, with the objective of identifying a machine-learning framework with the robustness required to support a chemical synthesis production platform. We used internally generated data from high-throughput parallel synthesis in context of pharmaceutical drug discovery projects. We tested machine-learning models from the following frameworks: XGBoost, ChemProp, and DeepChem, using a dataset of 7552 small molecules. Our findings show that two specific models, AttentiveFP and ChemProp, performed better than XGBoost and a regular neural network in predicting RT accurately. We also assessed how well these models performed over time and found that molecular graph neural networks consistently gave accurate predictions for new chemical series. In addition, when we applied ChemProp on the publicly available METLIN SMRT dataset, it performed impressively with an average error of 38.70 s. These results highlight the efficacy of molecular graph neural networks, especially ChemProp, in diverse RT prediction scenarios, thereby enhancing the efficiency of chromatographic analysis.

multidisciplinary sciences

What problem does this paper attempt to address?

The paper primarily explores how to use machine learning methods to predict the retention time (RT) of small molecules in chromatographic analysis, aiming to improve the efficiency and accuracy of high-throughput parallel synthesis work in drug discovery. The research team used an internally generated dataset containing 7,552 small molecule compounds and tested several models under different machine learning frameworks, including XGBoost, ChemProp, and DeepChem. The study found that two models based on molecular graph neural networks—AttentiveFP and ChemProp—outperformed other models in predicting retention time. Specifically, the ChemProp model combined with RDKit descriptors demonstrated high accuracy and robustness over time in the retention time prediction task. Additionally, the study evaluated the performance of these models over time and found that molecular graph-based neural networks could consistently provide accurate predictions for new chemical series. Furthermore, when ChemProp was applied to the publicly available METLIN SMRT dataset, it also showed excellent performance, with an average error of only 38.7 seconds. In summary, this study aims to identify robust retention time prediction models suitable for use in chemical synthesis production platforms in industrial-scale drug discovery projects by exploring the performance of models under different machine learning frameworks. Among them, ChemProp has proven to be an effective and adaptable method that can enhance the efficiency of chromatographic analysis, especially when handling data related to large-scale screening technologies.

Performance and robustness of small molecule retention time prediction with molecular graph neural networks in industrial drug discovery campaigns

Retention time prediction for small samples based on integrating molecular representations and adaptive network

An Adaptive Graph Learning Method for Automated Molecular Interactions and Properties Predictions

Machine learning to predict retention time of small molecules in nano-HPLC

Insights into predicting small molecule retention times in liquid chromatography using deep learning

Retention Time Prediction with Message-Passing Neural Networks

Analyzing Learned Molecular Representations for Property Prediction

Transfer learning for small molecule retention predictions

Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models

Machine learning framework to predict pharmacokinetic profile of small molecule drugs based on chemical structure

Bridging Chemical Knowledge and Machine Learning for Performance Prediction of Organic Synthesis.

Performance Insights for Small Molecule Drug Discovery Models: Data Scaling, Multitasking, and Generalization

Machine Learning Small Molecule Properties in Drug Discovery

Multi‐Task ADME/PK prediction at industrial scale: leveraging large and diverse experimentaldatasets

RT-Transformer: Retention time prediction for metabolite annotation to assist in metabolite identification

Advanced graph and sequence neural networks for molecular property prediction and drug discovery

The use of soil extractants to estimate plant-available molybdenum and selenium in potentially toxic soils

Prediction of Organic Reaction Outcomes Using Machine Learning

Deep Neural Network Pretrained by Weighted Autoencoders and Transfer Learning for Retention Time Prediction of Small Molecules

Complex machine learning model needs complex testing: Examining predictability of molecular binding affinity by a graph neural network