SAE-SV: A Stacked-AutoEncoder and Soft Voting Joint Approach Based on Small Dataset with High Dimensions for Inhibitory Potency Prediction

Xiaoguang Ma,Zhizhe Lin,Haotian Zhang
DOI: https://doi.org/10.1145/3644116.3644315
2023-10-20
Abstract:Traditional drug screening methods were time/labor-consuming and difficult to run on a large scale, and an efficient, rapid, and low-cost drug screening method was highly desirable. In this paper, we proposed a Stacked-AutoEncoder (SAE) and soft-voting (SV) joint approach to predict the inhibitory potency of compounds. Compared to conventional machine learning and deep learning methods, the SAE-SV model only needed a small dataset for training, wherein a prediction task was transformed into a simpler binary classification one to test related rank of the compounds to determine whether it was worth further investigation. Firstly, we designed a data augmentation strategy to expand the datasets to make effective use of the relationship between the compounds. After that, we employed Embedded and the SAE to decrease data dimensions in order to efficiently choose differentiated molecular descriptors for the classification task. Finally, an ensemble learning model of SV mechanism was introduced for classification. Extensive experiments revealed that the SAE-SV model had high accuracy, providing a potential ligand-based bioinformatics method for prioritizing chemicals for experimental studies.
Computer Science,Medicine,Chemistry
What problem does this paper attempt to address?