The "DNA" of chemistry: Scalable quantum machine learning with "amons"

Bing Huang,O. Anatole von Lilienfeld
2017-01-01
Abstract:Given sufficient examples, recently introduced machine learning models enable rapid, yet accurate, predictions of properties of new molecules. Extrapolation to larger molecules with differing composition is prohibitive due to all the specific chemistries which would be required for training. We address this problem by exploiting redundancies due to chemical similarity of repeating building blocks each represented by an effective {underline a}tom in {underline m}olecule: The am-on. In analogy to the DNA sequence in a gene encoding its function, constituting amons encode a query moleculeu0027s properties. The use of amons affords highly accurate machine learning predictions of quantum properties of arbitrary query molecules in real time. We investigate this approach for predicting energies of various covalently and non-covalently bonded systems. After training on the few amons detected, very low prediction errors can be reached, on par with experimental uncertainty. Systems studied include two dozen large biomolecules, eleven thousand medium sized organic molecules, large common polymers, water clusters, doped $h$BN sheets, bulk silicon, and Watson-Crick DNA base pairs. Conceptually, the amons extend Mendeleevu0027s table to account for the chemical environments of elements. They represent an important stepping stone to machine learning based virtual chemical space exploration campaigns.
What problem does this paper attempt to address?