DeepSPInN - Deep reinforcement learning for molecular Structure Prediction from Infrared and 13C NMR spectra

Sriram Devata,Bhuvanesh Sridharan,Sarvesh Mehta,Yashaswi Pathak,Siddhartha Laghuvarapu,Girish Varma,Deva Priyakumar
DOI: https://doi.org/10.26434/chemrxiv-2023-drhmj-v2
2024-01-17
Abstract:Molecular spectroscopy studies the interaction of molecules with electromagnetic radiation, and interpreting the resultant spectra is invaluable for deducing the molecular structures. However, predicting the molecular structure from spectroscopic data is a strenuous task that requires highly specific domain knowledge. DeepSPInN is a deep reinforcement learning method that predicts the molecular structure when given Infrared and 13C Nuclear magnetic resonance spectra by formulating the molecular structure prediction problem as a Markov decision process (MDP) and employs Monte-Carlo tree search to explore and choose the actions in the formulated MDP. On the QM9 dataset, DeepSPInN is able to predict the correct molecular structure for 91.5% of the input spectra in an average time of 77 seconds for molecules with less than 10 heavy atoms. This study is the first of its kind that uses only infrared and 13C nuclear magnetic resonance spectra for molecular structure prediction without referring to any pre-existing spectral databases or molecular fragment knowledge bases, and is a leap forward in automated molecular spectral analysis.
Chemistry
What problem does this paper attempt to address?
The paper mainly discusses how to predict molecular structures from infrared spectra and 13C nuclear magnetic resonance spectra using deep reinforcement learning (DeepSPInN). Currently, predicting molecular structures from spectral data is a challenging task that requires specific domain knowledge. DeepSPInN converts this problem into a Markov decision process (MDP) and uses Monte Carlo tree search to explore and select actions. On the QM9 dataset, DeepSPInN can correctly predict 91.5% of input spectra structures in an average of 77 seconds for molecules containing less than 10 heavy atoms. This is the first method that solely utilizes infrared and 13C nuclear magnetic resonance spectra for molecular structure prediction, without relying on pre-existing spectral databases or molecular fragment knowledge. It represents a significant advancement in automated molecular spectroscopy analysis. The paper also introduces details such as dataset selection, model architecture, reward function, and training methods.