PolyUniverse: Generation of a Large-scale Polymer Library Using Rule-Based Polymerization Reactions for Polymer Informatics

Tianle Yue,Jianxin He,Ying Li
DOI: https://doi.org/10.26434/chemrxiv-2024-7069c
2024-07-12
Abstract:Recent advancements in machine learning have revolutionized polymer research, leading to the swift integration of diverse computational techniques for de novo molecular design. A crucial aspect of these processes is to expand the number of candidate polymer structures, as the currently known real polymer structures are very limited. In contrast, small molecule databases are vast, offering extensive opportunities for the design of new molecules, such as drug discovery. In this study, we collected extensive small molecule compounds from GDB-17, GDB-13, and PubChem, and selected polymerization reaction pathways for eight types of polymers, including polyimide, polyolefin, polyester, polyamide, polyurethane, epoxy, polybenzimidazole (PBI), and vitrimer. These small molecule datasets and polymerization reactions enabled us to generate hundreds of quadrillions of hypothetical polymer structures. For each of the eight polymers, along with one promising copolymer, poly(imide-imine), we randomly generated over one million hypothetical structures, except for PBI, for which we created 10,000 structures. Chemical space visualization using t-distributed stochastic neighbor embedding and synthetic accessibility scores were employed to assess the feasibility of synthesizing these new polymers. Customized feedforward neural network models predicted thermal, mechanical, and gas permeation properties for both real and hypothetical polymers. Results show that many hypothetical polymers, especially polyimides, exhibit significant potential, often surpassing real polymers in performance, particularly for high-temperature applications and gas separation. Our findings highlight the immense potential of large-scale hypothetical polymer libraries for materials discovery and design. These libraries not only aid in identifying promising polymer materials through high-throughput screening but also provide valuable datasets for training advanced machine learning models, such as large language models. This research also demonstrates the power of data-driven approaches in polymer science, paving the way for the development of next-generation polymeric materials with superior properties for diverse industrial applications.
Chemistry
What problem does this paper attempt to address?
The paper aims to address a significant issue in polymer material design: how to generate a large number of hypothetical polymer structures through computer-aided methods and evaluate their synthetic feasibility and performance predictions. Specifically, the research objectives include: 1. **Expanding the number of candidate polymer structures**: The number of experimentally synthesized polymer structures is limited, whereas the database of small molecule compounds is vast. To utilize this resource in the design of new polymers, the authors selected a large number of small molecule compounds from databases such as GDB-17, GDB-13, and PubChem. 2. **Generating hypothetical polymer structures**: Based on eight types of polymers (polyimides, polyolefins, polyesters, polyamides, polyurethanes, epoxy resins, polybenzimidazoles [PBI], and vitrimers) and their copolymers (poly[imide-imide]), researchers generated billions of hypothetical polymer structures through defined polymerization reaction pathways. 3. **Evaluating synthetic feasibility**: Using chemical space visualization techniques and synthetic accessibility scores to assess the synthetic feasibility of the generated hypothetical polymer structures. 4. **Predicting performance**: Custom feedforward neural network models were used to predict the thermal performance, mechanical performance, and gas permeability of real and hypothetical polymers. These predictions help identify new polymer materials with excellent performance, especially for high-temperature applications and gas separation. 5. **Demonstrating the immense potential of the hypothetical polymer library**: The research results indicate that many hypothetical polymers, particularly polyimides, show significant potential, often outperforming known real polymers. In summary, this study successfully constructed a large-scale hypothetical polymer library by integrating machine learning techniques, polymerization reaction rules, and extensive small molecule compound databases, providing a valuable resource for new material discovery and design.