Abstract:Recent advancements in machine learning have revolutionized polymer research, leading to the swift integration of diverse computational techniques for de novo molecular design. A crucial aspect of these processes is to expand the number of candidate polymer structures, as the currently known real polymer structures are very limited. In contrast, small molecule databases are vast, offering extensive opportunities for the design of new molecules, such as drug discovery. In this study, we collected extensive small molecule compounds from GDB-17, GDB-13, and PubChem, and selected polymerization reaction pathways for eight types of polymers, including polyimide, polyolefin, polyester, polyamide, polyurethane, epoxy, polybenzimidazole (PBI), and vitrimer. These small molecule datasets and polymerization reactions enabled us to generate hundreds of quadrillions of hypothetical polymer structures. For each of the eight polymers, along with one promising copolymer, poly(imide-imine), we randomly generated over one million hypothetical structures, except for PBI, for which we created 10,000 structures. Chemical space visualization using t-distributed stochastic neighbor embedding and synthetic accessibility scores were employed to assess the feasibility of synthesizing these new polymers. Customized feedforward neural network models predicted thermal, mechanical, and gas permeation properties for both real and hypothetical polymers. Results show that many hypothetical polymers, especially polyimides, exhibit significant potential, often surpassing real polymers in performance, particularly for high-temperature applications and gas separation. Our findings highlight the immense potential of large-scale hypothetical polymer libraries for materials discovery and design. These libraries not only aid in identifying promising polymer materials through high-throughput screening but also provide valuable datasets for training advanced machine learning models, such as large language models. This research also demonstrates the power of data-driven approaches in polymer science, paving the way for the development of next-generation polymeric materials with superior properties for diverse industrial applications.

Is BigSMILES the Friend of Polymer Machine Learning?

Is BigSMILES the Friend of Polymer Machine Learning?

Generative BigSMILES: an extension for polymer informatics, computer simulations & ML/AI

Enhancing Copolymer Property Prediction through the Weighted-Chained-SMILES Machine Learning Framework

Automated BigSMILES conversion workflow and dataset for homopolymeric macromolecules

SMiPoly: Generation of a Synthesizable Polymer Virtual Library Using Rule-Based Polymerization Reactions

Machine learning enables interpretable discovery of innovative polymers for gas separation membranes

Landau theory of a constrained ferroelastic in two dimensions.

Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design

PolyUniverse: Generation of a Large-scale Polymer Library Using Rule-Based Polymerization Reactions for Polymer Informatics

Can Large Language Models Empower Molecular Property Prediction?

Extrapolative ML Models for Copolymers

Learning to SMILE(S)

A review on the application of molecular descriptors and machine learning in polymer design

Interpretable Machine Learning Strategies for Accurate Prediction of Thermal Conductivity in Polymeric Systems

Polymer informatics: Current status and critical next steps

Extrapolative Machine Learning Models for Copolymers

Challenges and opportunities of polymer design with machine learning and high throughput experimentation

A Novel Molecular Representation Learning for Molecular Property Prediction with a Multiple SMILES-Based Augmentation