ARC–MOF: A Diverse Database of Metal-Organic Frameworks with DFT-Derived Partial Atomic Charges and Descriptors for Machine Learning

Jake Burner,Jun Luo,Andrew White,Adam Mirmiran,Ohmin Kwon,Peter G. Boyd,Stephen Maley,Marco Gibaldi,Scott Simrod,Victoria Ogden,Tom K. Woo
DOI: https://doi.org/10.1021/acs.chemmater.2c02485
IF: 10.508
2023-01-20
Chemistry of Materials
Abstract:Metal–organic frameworks (MOFs) are a class of crystalline materials composed of metal nodes or clusters connected via semi-rigid organic linkers. Owing to their high-surface area, porosity, and tunability, MOFs have received significant attention for numerous applications such as gas separation and storage. Atomistic simulations and data-driven methods [e.g., machine learning (ML)] have been successfully employed to screen large databases and successfully develop new experimentally synthesized and validated MOFs for CO2 capture. To enable data-driven materials discovery for any application, the first (and arguably most crucial) step is database curation. This work introduces the ab initio REPEAT charge MOF (ARC–MOF) database. This is a database of ∼280,000 MOFs which have been either experimentally characterized or computationally generated, spanning all publicly available MOF databases. A key feature of ARC–MOF is that it contains density functional theory-derived electrostatic potential fitted partial atomic charges for each MOF. Additionally, ARC–MOF contains pre-computed descriptors for out-of-the-box ML applications. An in-depth analysis of the diversity of ARC–MOF with respect to the currently mapped design space of MOFs was performeda critical, yet commonly overlooked aspect of previously reported MOF databases. Using this analysis, balanced subsets from ARC–MOF for various ML purposes have been identified, with a case study of the effect of training set on the ML performance. Other chemical and geometric diversity analyses are presented, with an analysis on the effect of the charge-assignment method on atomistic simulation of the gas uptake in MOFs.
materials science, multidisciplinary,chemistry, physical
What problem does this paper attempt to address?