CopDDB: a descriptor database for copolymers and its applications to machine learning

Miho Hatanaka,Takayoshi Yoshimura,Hiromoto Kato,Shunto Oikawa,Taichi Inagaki,Tetsunori Sugawara,Tomonori Miyao,Takamitsu Matsubara,Hiroharu Ajiro,Mikiya Fujii,Yu-ya Ohnishi,Shigehito Asano
DOI: https://doi.org/10.26434/chemrxiv-2024-fzrgp-v3
2024-11-28
Abstract:Polymer informatics, which involves applying data-driven science to polymers, has attracted considerable research interest. However, developing adequate descriptors for polymers, particularly copolymers, to facilitate machine learning (ML) models with limited data sets remains a challenge. To address this issue, we computed sets of parameters, including reaction energies and activation barriers of elementary reactions in the early stage of radical polymerization, for 2500 radical–monomer pairs derived from 50 commercially available monomers and constructed an open database named “Copolymer Descriptor Database.” Furthermore, we built ML models using our descriptors as explanatory variables and physical properties such as the reactivity ratio, monomer conversion, monomer composition ratio, and molecular weight as objective variables. These models achieved high predictive accuracy, demonstrating the potential of our descriptors to advance the field of polymer informatics.
Chemistry
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to develop effective descriptors for machine - learning models applicable to copolymers, especially in the case of limited data sets. Specifically, the researchers focus on how to construct an open database "Copolymer Descriptor Database" (CopDDB) for 2,500 radical - monomer pairs by calculating parameters such as reaction energy and activation energy barriers. These descriptors are used in machine - learning models to predict the physical properties of copolymers, such as reaction rate ratio, monomer conversion rate, monomer composition ratio and molecular weight, etc. Through these efforts, the researchers aim to improve the research level in the field of polymer informatics. In particular, when exploring copolymers with specific properties, they can more accurately predict the properties of untested monomers or monomer pairs, thereby ensuring reliable extrapolation accuracy. The key points of the paper are as follows: 1. **Descriptor calculation and database construction**: The researchers calculated the descriptors of 2,500 radical - monomer pairs, including reaction energy, activation energy barriers, electronic parameters, geometric parameters, etc., and incorporated these descriptors into the CopDDB database. 2. **Application of machine - learning models**: Using the descriptors in CopDDB as explanatory variables, multiple machine - learning models were constructed to predict various physical properties of copolymers. 3. **Case studies**: The effectiveness of the CopDDB descriptors was verified through three case studies. The first case study predicted the reaction rate ratio \( r_1 \), the second case study predicted the physical properties of binary copolymers, and the third case study used the Bayesian optimization (BO) method to find suitable process variables to achieve the desired physical properties. In conclusion, by constructing and applying CopDDB, this paper demonstrates the potential for effectively predicting copolymer properties with limited data sets, providing new tools and methods for the development of polymer informatics.