Machine learning prediction of organic moieties from the IR spectra, enhanced by additionally using the derivative IR data

Maurycy Krzyżanowski,Grzegorz Matyszczak
DOI: https://doi.org/10.1007/s11696-024-03301-z
IF: 2.146
2024-02-01
Chemical Papers
Abstract:Infrared spectroscopy is a crucial analytical tool in organic chemistry, but interpreting IR data can be challenging. This study provides a comprehensive analysis of five machine learning models: logistic regression, KNN (k-nearest neighbors), SVM (support vector machine), random forest, and MLP (multilayer perceptron), and their effectiveness in interpreting IR spectra. The simple KNN model outperformed the more complex SVM model in execution time and F1 score, proving the potential of simpler models in interpreting the IR data. The combination of original spectra with its corresponding derivatives improved the performance of all models with a minimal increase in execution time. Denoising of the IR data was investigated but did not significantly improve performance. Although the MLP model showed better performance than the KNN model, its longer execution time is substantial. Ultimately, KNN is recommended for rapid results with minimal performance compromise, while MLP is suggested for projects prioritizing accuracy despite longer execution time.
chemistry, multidisciplinary
What problem does this paper attempt to address?