A Computational Approach to Interpreting the Embedding Space of Dimension Reduction

Bingyuan Zhang,Kohei Uno,Hayata Kodama,Koichi Himori,Yusuke Matsui
DOI: https://doi.org/10.1101/2024.06.23.600292
2024-06-27
Abstract:Nonlinear dimension reduction methods are widely applied in studies analyzing gene and protein expression, by revealing patterns of discrete groups and continuous orders in high-dimensional data. However, the tools are limited to understanding the obtained embedding structures of biological mechanisms, hindering the full exploitation of data. Here, we propose a novel framework to interpret embedding systematically by identifying and mapping associated biological functions. The method performs statistical tests and visualizes significantly enriched functions essential for the organization of the embedding structure, by applying it to the embedding results of two datasets: the Genotype Tissue Expression dataset and a Caenorhabditis elegans embryogenesis dataset, one capturing distinct cluster structures and the other capturing continuous developmental trajectories. We identified the associated functions for interpreting the two embeddings and confirmed it as a useful explainable AI tool in exploratory data analysis by providing annotations to the embedding space.
Bioinformatics
What problem does this paper attempt to address?