Predicting the Glass Transition Temperature of Biopolymers via High-Throughput Molecular Dynamics Simulations and Machine Learning

Stephan Mohr,Didac Martí,Rémi Pétuya,Emanuele Bosoni,Anne-Claude Dublanchet,Fabien Léonforte
DOI: https://doi.org/10.26434/chemrxiv-2023-t1c4n
2023-12-13
Abstract:Nature has only provided us with a limited number of bio-based and biodegradable building blocks. Therefore, the fine tuning of the sustainable polymer properties is expected to be achieved through the control of the composition of bio-based copolymers for targeted applications such as cosmetics. Until now, the main approaches to alleviate the experimental efforts and accelerate the discovery of new polymers have relied on machine learning models trained on experimental data, which implies an enormous and difficult work in the compilation of data from heterogeneous sources. On the other hand, molecular dynamics simulations of polymers have shown that they can accurately capture the experimental trends for a series of properties. However, the combination of different ratios of monomers in copolymers can rapidly lead to a combinatorial explosion, preventing the investigation of all possibilities via molecular dynamics simulations. In this work, we show that the combination of machine learning approaches and high-throughput molecular dynamics simulations permits to quickly and efficiently sample and characterize the relevant chemical design space for specific applications. Reliable simulation protocols have been implemented to evaluate the glass transition temperature of a series of 58 homopolymers, which exhibit a good agreement with experiments, and 488 copolymers. Overall, 2,184 simulations (4 replicas per polymer) were performed, for a total simulation time of 143.052 µs. These results, constituting a dataset of 546 polymers, have been used to train a machine learning model for the prediction of the MD-calculated glass transition temperature with a mean absolute error of 19.34 K and a R2 score of 0.83. Overall, within its applicability domain, this machine learning model provides an impressive acceleration over molecular dynamics simulations: the glass transition temperature of thousands of polymers can be obtained within seconds, whereas it would have taken node-years to simulate them. This type of approach can be tuned to address different design spaces or different polymer properties and thus have the potential to accelerate the discovery of new polymers.
Chemistry
What problem does this paper attempt to address?
The issue addressed in this paper is how to predict the glass transition temperature of biopolymers using high-throughput molecular dynamics simulations and machine learning. Currently, the design of sustainable polymers to meet specific applications, such as cosmetics, relies heavily on machine learning models trained on experimental data, which requires a significant amount of work to integrate data from different sources. On the other hand, although molecular dynamics simulations can accurately capture certain properties of polymers, combining monomers in different proportions can lead to combinatorial explosion, making it impossible to explore all possibilities through simulations. The paper proposes a method that combines machine learning and high-throughput molecular dynamics simulations to efficiently sample and characterize the relevant chemical design space for specific applications. The researchers performed 2184 simulations (4 replicas for each polymer) on 58 homopolymers and 488 copolymers, with a total simulation time of 143.052 microseconds, resulting in a dataset of 546 polymers. They trained a machine learning model to predict the glass transition temperature calculated by MD simulations, with an average absolute error of 19.34 K and an R² score of 0.83. This method significantly accelerates molecular dynamics simulations within its applicable range, allowing for the prediction of the glass transition temperature of thousands of polymers in a matter of seconds, whereas simulating them would take years of computing time. The main contribution of the paper is to provide an accelerated method for the discovery of new polymers, particularly in the cosmetics industry where the demand for sustainable and degradable polymers is increasing. With this approach, adjustments can be made for different design spaces or polymer attributes, aiding in the innovation of new materials.