Accelerating Metal-Organic Framework Discovery via Synthesisability Prediction: The MFD Evaluation Method for One-Class Classification Models

Matthew Stephen Dyer,Chi Zhang,Dmytro Antypov,Matthew J Rosseinsky
DOI: https://doi.org/10.26434/chemrxiv-2024-tlmp4
2024-05-22
Abstract:Machine learning has found wide application in the materials field, particularly in discovering structure-property relationships. However, its potential in predicting synthetic accessibility of materials remains relatively unexplored due to the lack of negative data. In this study, we employ several one-class classification (OCC) approaches to accelerate the development of novel metal-organic framework materials by predicting their synthesisability. The evaluation of OCC model performance poses challenges, as traditional evaluation metrics are not applicable when dealing with a single type of data. To overcome this limitation, we introduce a quantitative approach, the Maximum Fractional Difference (MFD) method, to assess and compare model performance, as well as determine optimal thresholds for effectively distinguishing between positives and negatives. A DeepSVDD model with superior predictive capability is proposed. By combining assessment of synthetic viability with porosity prediction models, a list of 3,453 unreported combinations is generated characterised by predictions of high synthesisability and large pore size. The MFD methodology proposed in this study is intended to provide an effective complementary assessment method for addressing the inherent challenges in evaluating OCC models. The research process, developed models, and predicted results of this study are aimed at helping prioritisation of materials for synthesis.
Chemistry
What problem does this paper attempt to address?
This paper aims to address the problem of predicting the synthesis of Metal-Organic Frameworks (MOFs). In materials science, machine learning has been used to discover structure-property relationships, but the feasibility of predicting materials synthesis has been less explored due to a lack of negative sample data. The researchers adopted a One-Class Classification (OCC) approach to accelerate the development of novel MOFs by predicting their synthesis feasibility. Since traditional evaluation metrics are not suitable for single-class data, they proposed a new quantitative evaluation method, the Maximum Score Difference (MFD) method, to assess and compare model performance and determine the optimal threshold for effectively distinguishing positive and negative samples. The paper presented a DeepSVDD model with excellent prediction capability and generated 3,453 unreported combinations by combining the synthesis feasibility and porosity prediction models, which were predicted to have high synthesis feasibility and large pore sizes. The MFD method aims to provide an effective supplementary method for evaluating the performance of OCC models, addressing inherent challenges in evaluation. The research findings contribute to the prioritization of synthetic materials and accelerate innovation in MOFs.