PolyNC: a natural and chemical language model for unified polymer properties prediction

Haoke Qiu,Lunyang Liu,Xuepeng Qiu,Xuemin Dai,Xiangling Ji,Zhao-Yan Sun
DOI: https://doi.org/10.1039/d3sc05079c
IF: 8.4
2023-12-08
Chemical Science
Abstract:Language models exhibit a profound aptitude for addressing multimodal and multidomain challenges, a competency that eludes the majority of off-the-shelf machine learning models. Consequently, language models hold great potential for comprehending the intricate interplay between material compositions and diverse properties, thereby accelerating material design, particularly in the realm of polymers. While past limitations in polymer data hindered the use of data-intensive language models, the growing availability of standardized polymer data and effective data augmentation techniques now opens doors to previously uncharted territories. Here, we present a revolutionary model to enable rapid and precise prediction of Poly mer properties via the power of N atural language and C hemical language (PolyNC). To showcase the efficacy of PolyNC, we have meticulously curated a labeled prompt-structure-property corpus encompassing 22,970 polymer data points on a series of essential polymer properties. Through the use of natural language prompts, PolyNC gains a comprehensive understanding of polymer properties, while employing chemical language (SMILES) to describe polymer structures. In a unified text-to-text manner, PolyNC consistently demonstrates exceptional performance on both regression tasks (such as property prediction) and the classification task (polymer classification). Simultaneous and interactive multitask learning enables PolyNC to holistically grasp the structure-property relationships of polymers. Through a combination of experiments and characterizations, the generalization ability of PolyNC has been demonstrated, with attention analysis further indicating that PolyNC effectively learns structural information about polymers from multimodal Inputs. This work provides compelling evidence of the potential for deploying end-to-end language models in polymer research, representing a significant advancement in the AI community's dedicated pursuit of advancing polymer science.
chemistry, multidisciplinary
What problem does this paper attempt to address?