Predicting polymerization reactions via transfer learning using chemical language models

Brenda S. Ferrari,Matteo Manica,Ronaldo Giro,Teodoro Laino,Mathias B. Steiner
DOI: https://doi.org/10.1038/s41524-024-01304-8
IF: 12.256
2024-06-06
npj Computational Materials
Abstract:Polymers are candidate materials for a wide range of sustainability applications such as carbon capture and energy storage. However, computational polymer discovery lacks automated analysis of reaction pathways and stability assessment through retro-synthesis. Here, we report an extension of transformer-based language models to polymerization for both reaction and retrosynthesis tasks. To that end, we have curated a polymerization dataset for vinyl polymers covering reactions and retrosynthesis for representative homo-polymers and co-polymers. Overall, we obtain a forward model Top-4 accuracy of 80% and a backward model Top-4 accuracy of 60%. We further analyze the model performance with representative polymerization examples and evaluate its prediction quality from a materials science perspective. To enable validation and reuse, we have made our models and data available in public repositories.
materials science, multidisciplinary,chemistry, physical
What problem does this paper attempt to address?
The paper aims to address several key issues in computational polymer discovery, particularly focusing on the automated analysis of reaction pathways and the evaluation of stability through retrosynthesis. Specifically, the research objectives include: 1. **Developing methods to predict polymerization reactions**: Utilizing transformer-based language model extensions to predict polymerization reactions and retrosynthesis tasks. 2. **Constructing a polymer dataset**: To train the aforementioned models, researchers compiled a dataset of polymerization reactions involving vinyl polymers, covering representative homopolymer and copolymer reactions and retrosynthesis instances. 3. **Improving prediction accuracy**: By fine-tuning the language model, achieving high Top-k accuracy in both forward prediction of polymerization reactions (predicting products given precursors) and reverse prediction (suggesting possible synthesis strategies given a polymer). 4. **Performance evaluation from a materials science perspective**: Evaluating the quality of the model's predictions from a materials science viewpoint through the analysis of representative polymerization reaction instances. 5. **Public release of data and models**: For validation and reuse purposes, the researchers publicly released their models and datasets in a public repository. In summary, the main goal of this paper is to leverage machine learning techniques, particularly transformer-based language models, to improve the predictive capabilities of polymer chemical reaction pathways, thereby accelerating the design and discovery process of new materials.