Transferability in Machine Learning for Electronic Structure via the Molecular Orbital Basis

Matthew Welborn,Lixue Cheng,Thomas F. Miller III
DOI: https://doi.org/10.1021/acs.jctc.8b00636
2018-07-27
Abstract:We present a machine learning (ML) method for predicting electronic structure correlation energies using Hartree-Fock <a class="link-external link-http" href="http://input.The" rel="external noopener nofollow">this http URL</a> total correlation energy is expressed in terms of individual and pair contributions from occupied molecular orbitals, and Gaussian process regression is used to predict these contributions from a feature set that is based on molecular orbital properties, such as Fock, Coulomb, and exchange matrix elements. With the aim of maximizing transferability across chemical systems and compactness of the feature set, we avoid the usual specification of ML features in terms of atom- or geometry-specific information, such atom/element-types, bond-types, or local molecular structure. ML predictions of MP2 and CCSD energies are presented for a range of systems, demonstrating that the method maintains accuracy while providing transferability both within and across chemical families; this includes predictions for molecules with atom-types and elements that are not included in the training set. The method holds promise both in its current form and as a proof-of-principle for the use of ML in the design of generalized density-matrix functionals.
Chemical Physics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to predict the correlation energy of the electronic structure through Hartree - Fock input in machine learning (ML) methods, and it pays special attention to improving the transferability of the model between different chemical systems. Traditional methods usually rely on atom - or geometry - specific features, such as atom types, bond types or local molecular structures, which limit the generality of the model and its predictive ability for unseen elements or compounds. This paper proposes a machine - learning method based on molecular orbital characteristics, aiming to avoid using these specific atom - or geometry - specific features, thereby improving the compactness of the model and its transferability across chemical systems. Specifically, the main objectives of the paper include: 1. **Improve transferability**: Develop a machine - learning model that can make accurate predictions within different chemical families and across chemical families, and can maintain high prediction accuracy even for molecules containing atom types or elements not seen in the training set. 2. **Reduce the number of features**: Construct a compact feature set by using features based on Hartree - Fock molecular orbitals instead of traditional atom - or geometry - specific features, thereby reducing the complexity of the model and the amount of data required for training. 3. **Maintain prediction accuracy**: While improving the transferability of the model, ensure the accuracy of the model in predicting MP2 and CCSD energies, so that it can be applied to actual chemical calculations and research. By introducing a feature set based on molecular orbitals and using Gaussian Process Regression (GPR) to predict the diagonal and off - diagonal contributions of the correlation energy, the paper demonstrates the effectiveness and potential of this method. The experimental results show that this method not only has good transferability between different geometric configurations of a single molecule, but also performs well between different molecular families, and can even accurately predict the correlation energy of molecules containing unseen elements.