Machine Learning Enhanced Spectrum Recognition Based on Computer Vision (SRCV) for Intelligent NMR Data Extraction

Wenqiang Jia,Zhuo Yang,Minjian Yang,Liang Cheng,Zengrong Lei,Xiaojian Wang
DOI: https://doi.org/10.1021/acs.jcim.0c01046
IF: 6.162
2020-11-10
Journal of Chemical Information and Modeling
Abstract:A machine learning enhanced spectrum recognition system called spectrum recognition based on computer vision (SRCV) for data extraction from previously analyzed <sup>13</sup>C and <sup>1</sup>H NMR spectra has been developed. The intelligent system was designed with four function modules to extract data from three areas of NMR images, including <sup>13</sup>C and <sup>1</sup>H chemical shifts, the integral, and the range of the shift values. During this study, three machine learning models were pretrained for number recognition, which is the key procedure for NMR data extraction. The <i>k</i> nearest neighbor (<i>k</i>NN) method was selected with optimized <i>k</i> (<i>k</i> = 4), which displayed a 100% recognition rate. Subsequently, the performance of SRCV was tested and validated to have high accuracy with a short processing time (11–21 s) for each NMR spectral image. Our spectrum recognizer enables high-throughput <sup>13</sup>C and <sup>1</sup>H NMR data extraction from abundant spectra in the literature and has the potential to be used for spectral database construction. In addition, the system may be applicable to be developed for data import to computer-assisted structure elucidation systems, which would automate this procedure significantly. SRCV can be accessed in GitHub (<a class="extLink" href="https://github.com/WJmodels/SRCV">https://github.com/WJmodels/SRCV</a>).The Supporting Information is available free of charge at <a class="ext-link" href="/doi/10.1021/acs.jcim.0c01046?goto=supporting-info">https://pubs.acs.org/doi/10.1021/acs.jcim.0c01046</a>.Experimental section and supporting results (<a class="ext-link" href="/doi/suppl/10.1021/acs.jcim.0c01046/suppl_file/ci0c01046_si_001.pdf">PDF</a>)This article has not yet been cited by other publications.
chemistry, multidisciplinary, medicinal,computer science, interdisciplinary applications, information systems
What problem does this paper attempt to address?