Powerful and Reliable Prediction using Latent Variables of Experimentally Unobservable Reactions in Organic Synthesis

Norie MOMIYAMA,Kazuhiro TAKEDA,Naoya OHTSUKA,Toshiyasu SUZUKI
DOI: https://doi.org/10.26434/chemrxiv-2023-bvvdb-v5
2024-08-01
Abstract:In this study, a novel machine learning algorithm was designed to assist in the development of organic reactions. This algorithm addresses the complexities inherent in batch- type organic reactions, including the necessity for numerous experiments and the effects of intricate characteristics of reaction pathways. By integrating molecular relationships and actual yields from observable reactions, the algorithm is used to estimate untested yields via extrapolation. An approach based on Bayesian optimization and dual annealing optimization is employed to compute expected values and evaluate plausibility. The algorithm’s dual-loop 2 structure, incorporating latent variables and experimental values, maximizes the coefficient of determination. Physicochemical aspects of the algorithm are validated using natural bond orbital charges, and its utility in synthesizing perfluoroiodinated naphthalenes is demonstrated. The algorithm exhibits potential for application in predicting experimentally unobservable reactions, thereby advancing the field of synthetic organic chemistry.
Chemistry
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to propose a new machine learning algorithm to assist in the development of organic reactions. Specifically, the algorithm addresses some inherent issues in batch-type reactions in organic synthesis, including the need for numerous experiments and the complexity of reaction pathway effects. By integrating intermolecular relationships and actual yield data, the algorithm can estimate untested yields and use Bayesian optimization and dual annealing optimization methods to calculate expected values and assess their reasonableness. The algorithm employs a dual-loop structure, combining latent variables and experimental values to maximize the coefficient of determination. Additionally, the algorithm's physicochemical properties are validated through natural bond orbital charges, and its potential application in the synthesis of perfluorinated iodinated naphthalene compounds is demonstrated. ### Specific Research Content 1. **Objective**: Develop an algorithm to estimate untested yields through extrapolation. 2. **Methods**: - The algorithm integrates the relative relationships between molecules and untested yield data, as well as the actually observed yield data. - Bayesian optimization and dual annealing optimization are used to calculate expected values and assess reasonableness. - A dual-loop structure combines latent variables and experimental values to maximize the coefficient of determination. 3. **Validation**: The algorithm's physicochemical properties are validated through natural bond orbital charges. 4. **Application**: The potential application of the algorithm in the synthesis of perfluorinated iodinated naphthalene compounds is demonstrated. Through these methods, the paper aims to advance research in the field of organic synthesis, particularly in predicting reactions that cannot be observed experimentally.