Learning to Make Chemical Predictions: the Interplay of Feature Representation, Data, and Machine Learning Algorithms

Mojtaba Haghighatlari,Jie Li,Farnaz Heidar-Zadeh,Yuchen Liu,Xingyi Guan,Teresa Head-Gordon
DOI: https://doi.org/10.48550/arXiv.2003.00157
IF: 2.552
2020-02-29
Chemical Physics
Abstract:Recently supervised machine learning has been ascending in providing new predictive approaches for chemical, biological and materials sciences applications. In this Perspective we focus on the interplay of machine learning algorithm with the chemically motivated descriptors and the size and type of data sets needed for molecular property prediction. Using Nuclear Magnetic Resonance chemical shift prediction as an example, we demonstrate that success is predicated on the choice of feature extracted or real-space representations of chemical structures, whether the molecular property data is abundant and/or experimentally or computationally derived, and how these together will influence the correct choice of popular machine learning algorithms drawn from deep learning, random forests, or kernel methods.
What problem does this paper attempt to address?