Autoencoder-based Dimensionality Reduction for QSAR Modeling

Alaaeldin M. Hafez,Isra M. Al-Turaiki,Shrooq A. Alsenan
DOI: https://doi.org/10.1109/ICCAIS48893.2020.9096747
2020-03-01
Abstract:The recent advances in Machine Learning tools and algorithms have influenced fields including drug discovery. Nowadays, research conducted via trial- and-error experiments have been replaced by computational approaches. This growth prompted an undeniable development in synthesizing chemical data to support chemoinformatics research. One of the widely used tools to model chemoinformatics problems is Quantitative Structure-Activity Relationships (QSAR). Previous QSAR models were dealing with small datasets and limited number of features. Current QSAR datasets suffer from the problem of high dimensionality, where the number of features exceeds the number of records. Over the years, the curse of high dimensionality posed a major shortcoming in QSAR classification models. Linear Principle Component Analysis is a popular feature extraction method used to reduce the high dimensioanlity of QSAR datasets. However, QSAR datasets are highly complex and require deep understanding of features representation. Autoencoder is a type of neural networks that is not fully explored in QSAR modeling for dimensionality reduction purposes. In this research, we investigate the impact of autoencoder on a high dimensional QSAR dataset. The autoencoder performance is compared with PCA on the over all accuracy measure. Our preliminary analysis demonstrated that the proposed technique outperforms PCA.
Chemistry,Computer Science
What problem does this paper attempt to address?