How to actively learn chemical reactions yields in real-time using stopping criteria

Vincent Delmas,Denis Jacquemin,Aymeric Blondel,Morgane Vacher,Adèle D Laurent
DOI: https://doi.org/10.1039/d3re00628j
IF: 5.2002
2024-01-27
Reaction Chemistry & Engineering
Abstract:Chemical reactions are central for the creation of new materials, drug design and many more fields. The access to high reaction yields is of great importance to reduce cost, increase efficiency and purity of the obtained product. To reduce the number of experiments for high reactions yields screening in organic chemistry, the use of Active Learning (AL) is an interesting approach. Unfortunately, the majority of the AL is based on ``retro-AL'' where all the reactions are already available. One problem of ``real-time'' AL is to determine when to stop the AL loop without creating an external labelized test set to analyze the performance of the model. The stopping procedure presented in this work is a stopping criterion, namely the Stabilization Prediction (SP) from [Bloodgood et al., Proceedings of the Thirteenth Conference on Computational Natural Language Learning, 2009, 39-47]. It uses an unlabeled equivalent of a test set called a stop set to indirectly evaluate the accuracy of the AL loop. To benchmark the stability of this method and investigate its applicability ot chemistry, two datasets from the organic literature, four estimators, three types of descriptors, two sizes of queries per iteration (QPI) and stop set size were investigated. We determine that the present method is the most stable with a SVC estimator, 50 QPI and a stop set size containing 30% of the data. It produces the best compromise between an early stop (consume less than 50% of the data) and a reliable accuracy over 10 different runs compared to the accuracy obtained with classical supervised machine learning. We do hope that this method would be of use to create ``real-time'' AL in chemistry.
chemistry, multidisciplinary,engineering, chemical
What problem does this paper attempt to address?