Hierarchical Matrix Completion for the Prediction of Properties of Binary Mixtures

Dominik Gond,Jan-Tobias Sohns,Heike Leitte,Hans Hasse,Fabian Jirasek
2024-10-08
Abstract:Predicting the thermodynamic properties of mixtures is crucial for process design and optimization in chemical engineering. Machine learning (ML) methods are gaining increasing attention in this field, but experimental data for training are often scarce, which hampers their application. In this work, we introduce a novel generic approach for improving data-driven models: inspired by the ancient rule "similia similibus solvuntur", we lump components that behave similarly into chemical classes and model them jointly in the first step of a hierarchical approach. While the information on class affiliations can stem in principle from any source, we demonstrate how classes can reproducibly be defined based on mixture data alone by agglomerative clustering. The information from this clustering step is then used as an informed prior for fitting the individual data. We demonstrate the benefits of this approach by applying it in connection with a matrix completion method (MCM) for predicting isothermal activity coefficients at infinite dilution in binary mixtures. Using clustering leads to significantly improved predictions compared to an MCM without clustering. Furthermore, the chemical classes learned from the clustering give exciting insights into what matters on the molecular level for modeling given mixture properties.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of predicting the thermodynamic properties of mixtures in chemical engineering. Specifically, the authors focus on how to improve the ability of machine learning (ML) methods to predict the properties of binary mixtures in the absence of abundant experimental data. Since obtaining experimental data is time-consuming and costly, methods that can effectively utilize limited data to improve prediction accuracy have significant practical value. The paper introduces a novel general method that combines Hierarchical Matrix Completion (HMC) with the ancient principle of "similia similibus solvuntur" (like dissolves like) to classify components with similar behavior and jointly model them. This method first determines the component categories through cluster analysis and then uses this category information as prior knowledge to optimize the learning process of individual component parameters, thereby improving prediction accuracy. Specifically, the method shows significant improvement in predicting the activity coefficients of binary mixtures under infinite dilution conditions.