Determination of Anti-SARS-CoV-2 Activity of Compounds Based on Machine Learning

Zeyu Cui,Qing Liu,Mengmeng Fan,Dakuo He,Yue Hou
DOI: https://doi.org/10.1109/itcem57303.2022.00021
2022-01-01
Abstract:In 2020, the outbreak of pneumonia caused by novel coronavirus spread rapidly all over the world. In the absence of a specific drug, novel coronavirus is still pandemic all over the world. In this paper, we proposed an improved molecular activity prediction model by adding feature selection method on the basis of comparing different methods to extract molecular features and machine learning models. We first used the anti-SARS-CoV-2 compounds reported in recent literatures to construct the data set, and then constructed three machine learning models. In addition, we tried to use three methods to extract molecular features in each model. In order to further improve the performance of the model, we add three feature selection methods. Through the comparison of different models, finally, we used FCFP to extract molecular features and added lasso feature selection method to establish the SVM model. Its test set accuracy is 90.0%, and the AUC value is 0.961, which could well predict the anti-SARS-CoV-2 activity of the compound. Our model can be used to speed up the research and discovery of anti-SARS-CoV-2 drugs.
What problem does this paper attempt to address?