Anomeric Selectivity of Glycosylations Through a Machine Learning Lens

Natasha Videcrantz Faurschou,Victor Friis,Priyanka Raghavan,Christian Marcus Pedersen,Connor W. Coley
DOI: https://doi.org/10.26434/chemrxiv-2024-jw9dx
2024-11-15
Abstract:Predicting the stereoselectivity of glycosylations is a major challenge in carbohydrate chemistry. Herein we show that it is possible to build machine learning models that can predict the major anomer of a glycosylation, whether the other anomer is observed as the minor product, and the anomeric ratio of the two anomers. The three models are integrated into a publicly available tool, GlycoPredictor. From a statistical analysis of literature data, we analyze glycosylation trends and compare them to known trends in the field of carbohydrate chemistry, making it possible to elucidate a hierarchy of rules governing the stereoselectivity of glycosylations and discover promising new trends that complement expert intuition.
Chemistry
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to predict the stereoselectivity in glycosylation reactions, especially to predict the major anomer, whether the minor anomer is observed, and the ratio of the two anomers. Specifically, the authors solve the following key problems by constructing machine - learning models: 1. **Predicting the major anomer**: That is, to determine whether the major anomer generated in the glycosylation reaction is of the α or β type. 2. **Predicting whether the minor anomer is observed**: To judge whether there will be the presence of the minor anomer in the reaction product. 3. **Predicting the anomer ratio**: If the minor anomer exists, predict the ratio between the major and minor anomers. To achieve these goals, the authors trained a variety of machine - learning models using literature data sets and integrated these models into an publicly available tool - GlycoPredictor. By statistically analyzing the literature data, the authors also explored the trends of glycosylation reactions and compared them with the known trends in the field of carbohydrate chemistry, thus revealing the hierarchical structure of rules guiding the stereoselectivity of glycosylation and discovering new trends that complement expert intuition. ### Main contributions 1. **Model construction**: Successfully constructed machine - learning models capable of predicting the stereoselectivity of glycosylation reactions. 2. **Data diversity**: Used a highly diverse data set from the literature, avoiding the limitations of manual analysis. 3. **New trend discovery**: Algorithmic discovery of new stereoselectivity principles through machine - learning methods, which are supported by literature data. ### Method overview - **Data preparation**: Extract single - step reaction data published from 2010 to 2015 from the CAS Content Collection and filter out reactions of the glycosylation type. - **Feature representation**: Use multiple methods such as graph neural networks (GNN), fingerprints (FP), and one - hot encoding (OHE) to represent the features of reactants and conditions. - **Model training**: Trained three types of models: graph - based models, fingerprint - based models, and one - hot encoding - based models. - **Performance evaluation**: Evaluate the performance of the models through multiple methods such as random splitting, leaving - group splitting, and scaffold splitting. ### Results - **Major anomer prediction**: The AUC values of the model under random splitting and publication splitting are 0.97 and 0.93 respectively. - **Minor anomer prediction**: The AUC values of the model under random splitting and publication splitting are 0.95 and 0.88 respectively. - **Anomer ratio prediction**: The RMSE of the model under random splitting is 19.6% and the R² is 0.55; the RMSE under publication splitting is 27.0% and the R² is 0.11. ### Conclusion By constructing and validating these machine - learning models, the authors demonstrated the potential of machine - learning in predicting the stereoselectivity of glycosylation reactions and provided new tools and methods for research in the field of carbohydrate chemistry.