Enhancing Copolymer Property Prediction through the Weighted-Chained-SMILES Machine Learning Framework

Qi Huang,Zuowei Chen,Ziwei Lin,Weimin Li,Wenjie Yu,Lei Zhu
DOI: https://doi.org/10.1021/acsapm.3c02715
2024-03-25
ACS Applied Polymer Materials
Abstract:Accurately predicting copolymer properties plays a pivotal role in the field of polymer informatics. This endeavor necessitates a comprehensive understanding of polymer structures, adept feature engineering, and proficient application of machine learning algorithms. In traditional methodologies, features for each monomer structure were generated independently, thus, segregating features from individual monomers. This approach results in a less informative representation, with limited applicability. To address these challenges, we introduce an innovative machine learning framework, named weighted-chained-SMILES. By constructing a representative SMILES notation, more intricate information can be encapsulated within the generated features. Our experimental results to predict the thermal properties demonstrate that our approach not only delivers competitive predictive performance but also exhibits enhanced adaptability across a diverse range of molecular representations. The versatility showcased by our model suggests promising potential for tackling more complex copolymer systems and extending its predictive capabilities to various other polymer properties.
polymer science,materials science, multidisciplinary
What problem does this paper attempt to address?
The paper primarily addresses the challenges in predicting the properties of copolymers by proposing an innovative machine learning framework called the Weighted-Chained-SMILES (WCS) machine learning framework. Specifically, the paper aims to solve the following issues: 1. **Limitations of existing methods**: Traditional methods generate features for each monomer structure independently, resulting in incomplete information representation and limited applicability. 2. **Complexity of copolymer structures**: Copolymers consist of various different repeating units, making their structures more complex than homopolymers, which adds extra challenges to polymer property prediction. 3. **Insufficient feature representation**: Existing methods fail to adequately consider the connection relationships and proportion information between monomers when dealing with copolymers, leading to incomplete feature representation. ### Solution Overview - **WCS Framework**: By constructing representative SMILES representations to encapsulate more complex information and performing feature extraction based on this, the method not only retains the advantages of fixed-length features but also enhances information expression, especially regarding the structural properties at the monomer junctions. - **Experimental Validation**: The paper validates the proposed method's effectiveness through experimental datasets, particularly evaluating the prediction performance for glass transition temperature (Tg) and thermal decomposition temperature (Td). The results show that the method under the WCS framework has better predictive performance compared to traditional methods. - **Model Adaptability and Scalability**: The framework has broad applicability and potential for future applications, especially in handling more complex copolymer systems and extending to the prediction of other polymer properties. In summary, this study aims to improve the accuracy of copolymer property prediction by developing a new machine learning framework, especially considering the monomer proportions and connection methods, with the goal of providing more powerful tools for the field of polymer informatics.