From High Dimensions to Human Insight: Exploring Dimensionality Reduction for Chemical Space Visualization
Alexey A. Orlov,Tagir N. Akhmetshin,Dragos Horvath,Gilles Marcou,Alexandre Varnek
DOI: https://doi.org/10.1002/minf.202400265
IF: 4.05
2024-12-07
Molecular Informatics
Abstract:Dimensionality reduction is an important exploratory data analysis method that allows high‐dimensional data to be represented in a human‐interpretable lower‐dimensional space. It is extensively applied in the analysis of chemical libraries, where chemical structure data ‐ represented as high‐dimensional feature vectors‐are transformed into 2D or 3D chemical space maps. In this paper, commonly used dimensionality reduction techniques ‐ Principal Component Analysis (PCA), t‐Distributed Stochastic Neighbor Embedding (t‐SNE), Uniform Manifold Approximation and Projection (UMAP), and Generative Topographic Mapping (GTM) ‐ are evaluated in terms of neighborhood preservation and visualization capability of sets of small molecules from the ChEMBL database.
chemistry, medicinal,mathematical & computational biology,computer science, interdisciplinary applications