ZeoSyn: A Comprehensive Zeolite Synthesis Dataset Enabling Machine-Learning Rationalization of Hydrothermal Parameters

Elton Pan,Soonhyoung Kwon,Zach Jensen,Mingrou Xie,Rafael Gómez-Bombarelli,Manuel Moliner,Yuriy Román-Leshkov,Elsa Olivetti,Rafael Gómez-Bombarelli,Yuriy Román-Leshkov
DOI: https://doi.org/10.1021/acscentsci.3c01615
IF: 18.2
2024-03-06
ACS Central Science
Abstract:Zeolites, nanoporous aluminosilicates with well-defined porous structures, are versatile materials with applications in catalysis, gas separation, and ion exchange. Hydrothermal synthesis is widely used for zeolite production, offering control over composition, crystallinity, and pore size. However, the intricate interplay of synthesis parameters necessitates a comprehensive understanding of synthesis-structure relationships to optimize the synthesis process. Hitherto, public zeolite synthesis databases only contain a subset of parameters and are small in scale, comprising up to a few thousand synthesis routes. We present ZeoSyn, a dataset of 23,961 zeolite hydrothermal synthesis routes, encompassing 233 zeolite topologies and 921 organic structure-directing agents (OSDAs). Each synthesis route comprises comprehensive synthesis parameters: 1) gel composition, 2) reaction conditions, 3) OSDAs, and 4) zeolite products. Using ZeoSyn, we develop a machine learning classifier to predict the resultant zeolite given a synthesis route with >70% accuracy. We employ SHapley Additive exPlanations (SHAP) to uncover key synthesis parameters for >200 zeolite frameworks. We introduce an aggregation approach to extend SHAP to all building units. We demonstrate applications of this approach to phase-selective and intergrowth synthesis. This comprehensive analysis illuminates the synthesis parameters pivotal in driving zeolite crystallization, offering the potential to guide the synthesis of desired zeolites. The dataset is available at https://github.com/eltonpan/zeosyn_dataset.
chemistry, multidisciplinary
What problem does this paper attempt to address?
The problems that this paper attempts to solve are: How to construct a comprehensive zeolite hydrothermal synthesis dataset (ZeoSyn) and use machine - learning methods to reveal and optimize the complex parameter relationships in the zeolite synthesis process, thereby guiding the synthesis of specific zeolite structures. Specifically, the paper aims to solve the following problems: 1. **Limitations of existing datasets**: - Existing public zeolite synthesis databases only contain some parameters and are relatively small in scale, usually with only a few thousand synthesis routes. This restricts the understanding of the entire zeolite synthesis space. - The lack of a comprehensive dataset covering all key parameters (such as gel composition, reaction conditions, organic structure - directing agents (OSDA), etc.) leads to data scarcity and sparseness problems. 2. **Understanding the relationship between synthesis parameters and structures**: - Zeolite synthesis involves multiple variables (such as framework heteroatoms, inorganic and organic cations, structure - directing agents, mineralizers and hydrothermal conditions), and the complex interactions among these variables need to be more deeply understood. - Through large - scale data analysis, reveal which synthesis parameters play a key role in the formation of specific zeolite structures, thereby providing guidance for optimizing the synthesis process. 3. **Predicting and designing new zeolite structures**: - Use machine - learning models to predict zeolite products under given synthesis routes, improving the success rate and efficiency of synthesis. - By analyzing the influence of different synthesis parameters on zeolite crystallization, guide the design and discovery of new zeolite structures. 4. **Handling failed synthesis experiments**: - Include negative data (i.e., experimental conditions where zeolite was not successfully synthesized) to avoid the bias of only reporting successful results in the literature and help researchers better understand which conditions may lead to failure. Through the solution of these problems, the ZeoSyn dataset and related machine - learning models can provide a more systematic and comprehensive understanding in the field of zeolite synthesis and accelerate the development and application of new zeolite materials.