On-Demand Reverse Design of Polymers with PolyTAO

Haoke Qiu,Zhao-Yan Sun
DOI: https://doi.org/10.26434/chemrxiv-2024-3z7tw-v2
2024-06-06
Abstract:The forward screening and reverse design of drug molecules, inorganic molecules, and polymers with enhanced properties are vital for accelerating the transition from laboratory research to market application. Specifically, due to the scarcity of large-scale datasets, the discovery of polymers via materials informatics is particularly challenging. Nonetheless, scientists have developed various machine learning models for polymer structure-property relationships using only small polymer datasets, thereby advancing the forward screening process of polymers. However, the success of this approach ultimately depends on the diversity of the candidate pool, and exhaustively enumerating all possible polymer structures through human imagination is impractical. Consequently, achieving on-demand reverse design of polymers is essential. In this work, we curate an immense polymer dataset containing nearly one million polymeric structure-property pairs based on expert knowledge. Leveraging this dataset, we propose a Transformer-Assisted Oriented pretrained model for on-demand polymer generation (PolyTAO). This model produces polymers with 99.27% chemical validity in top-1 generation mode (approximately 200k generated polymers), representing the highest reported success rate among polymer generative models. Additionally, the average R2 between the properties of the generated polymers and their expected values across 15 predefined properties is 0.96. To further evaluate the pretrained model's performance in generating polymers with additional user-defined properties for downstream tasks, we conduct fine-tuning experiments on three publicly available small polymer datasets using both semi-template and template-free generation paradigms. Through these extensive experiments, we demonstrate that our pretrained model and its fine-tuned versions are capable of achieving on-demand reverse design of polymers with specified properties, whether in semi-template generation or the more challenging template-free generation scenarios, showcasing its potential as a unified pretrained foundation model for polymer generation.
Chemistry
What problem does this paper attempt to address?
This paper focuses on reverse design of polymers, which involves designing new polymer structures based on specific performance requirements. Currently, although there are some machine learning models for forward screening of polymers (finding polymers with desired properties from known structures), these methods are limited by small-scale datasets and depend on human imagination to enumerate all possible polymer structures. To address this issue, researchers constructed a large polymer dataset, consisting of nearly one million polymer structure-property pairs, and proposed a new model called PolyTAO based on this dataset. PolyTAO is a transformer-based pre-training model specifically designed for on-demand polymer generation. The model achieves a chemical validity rate of 99.27% in generating approximately 200,000 polymers under a single generation mode, which is the highest success rate among reported polymer generation models. In addition, it achieves an average coefficient of determination (R²) of 0.96 across 15 predefined properties, demonstrating high predictive accuracy for target properties. The paper further demonstrates the generalization ability of PolyTAO in various downstream tasks, including semi-template and template-free generation experiments on three publicly available small-scale polymer datasets. These experiments indicate that PolyTAO can achieve on-demand reverse design of polymers with specific properties, whether in the case of semi-template or more challenging template-free generation. Through this approach, PolyTAO not only expands the chemical space of polymers, but also has the ability to generate polymers with diverse structural features, demonstrating its capability to comprehensively explore the polymer space and potentially serve as a foundational pre-training model for polymer generation. Overall, this work provides new strategies for the design of polymer materials and accelerates the translation of laboratory research achievements into market applications.