Active Learning High Coverage Sets of Complementary Reaction Conditions

Nicholas Jackson,David Friday,Sofia Sivilotti
DOI: https://doi.org/10.26434/chemrxiv-2024-37bpl
2024-11-13
Abstract:Chemical reaction conditions capable of producing high yields over diverse reactants will be a key component of future self driving labs. While much work has been done to discover general reaction conditions, any single conditions are necessarily limited over increasingly diverse chemical spaces. A potential solution to this problem is to identify small sets of complementary reaction conditions that, when combined, cover a much larger chemical space than any one general reaction condition. In this work, we analyze experimentally derived datasets to assess the relative performance of individual general reaction conditions vs sets of complementary reaction conditions. We then propose and benchmark active learning methods to efficiently discover these complimentary sets of conditions. The results show the value of active learning in exploring sets of reaction conditions and provide an avenue for improving synthetic hit rates in high-throughput synthesis campaigns.
Chemistry
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of wide - coverage of chemical reaction conditions, especially how to find a set of complementary reaction conditions in high - throughput synthesis to cover a broader chemical space. Specifically, the authors focus on: 1. **Limitations of a single general reaction condition**: - A single general reaction condition is insufficient in the face of an increasingly diverse chemical space and cannot guarantee high yields for all reactants. 2. **The need to explore sets of complementary reaction conditions**: - To overcome the limitations of a single condition, the authors propose a new method: identifying small - scale sets of complementary reaction conditions that, when combined, can cover a larger chemical space than any single general condition. 3. **Using active learning (AL) strategies to accelerate discovery**: - Since the acquisition of experimental data is costly and time - consuming, the authors introduce an active learning method to efficiently discover these sets of complementary reaction conditions. Through active learning, the optimal combination of conditions can be found with fewer experimental runs. ### Research background and motivation With the application of artificial intelligence (AI) technology in chemical optimization, significant progress has been made in many chemical sub - fields such as catalysis, drug discovery, and material discovery. However, the challenge in the high - throughput synthesis process is how to ensure that the selected molecules can be successfully synthesized under known reaction conditions. To achieve this, it is usually necessary to define a synthesizable chemical space and use machine learning (ML) to select molecules for synthesis and testing. This requires reaction types and conditions that can cover the predefined chemical space. ### Main research content 1. **Data analysis**: - Analyze the existing experimental data sets and evaluate the relative performance of a single general reaction condition and sets of complementary reaction conditions. 2. **Development and testing of active learning methods**: - Propose and test multiple active learning strategies to efficiently discover sets of complementary reaction conditions. 3. **Result verification**: - The results show that the active learning method is valuable in exploring sets of reaction conditions and can improve the success rate of high - throughput synthesis activities. ### Formula representation Some of the formulas involved in the paper include: - Probability prediction of reaction success: \[ \phi_{r,c} \] where \( r \) represents the reactant, \( c \) represents the reaction condition, 0 represents definite failure, 1 represents definite success, and 0.5 represents complete ignorance. - Exploration function: \[ \text{Explorer}_{r,c} = 1 - 2(|\phi_{r,c} - 0.5|) \] - Exploitation function: \[ \text{Exploitr}_{r,c} = \frac{1}{|C|} \left[ \gamma\{c\} + \sum_{c_i \in C \setminus \{c\}} \gamma\{c, c_i\}(1 - \phi_{r, c_i}) \right] \] - Combined function of exploration and exploitation: \[ \text{Combined}_{r,c} = \alpha \cdot \text{Explorer}_{r,c} + (1 - \alpha) \cdot \text{Exploitr}_{r,c} \] ### Summary By analyzing experimental data and developing active learning algorithms, the authors demonstrate the superiority of sets of complementary reaction conditions in covering a larger chemical space and provide an effective method for quickly discovering these sets. This research provides new ideas and technical means for high - throughput synthesis and helps to improve the synthesis success rate and efficiency.