Lens functions for exploring UMAP Projections with Domain Knowledge

Daniel M. Bot,Jan Aerts
2024-05-15
Abstract:Dimensionality reduction algorithms are often used to visualise high-dimensional data. Previously, studies have used prior information to enhance or suppress expected patterns in projections. In this paper, we adapt such techniques for domain knowledge guided interactive exploration. Inspired by Mapper and STAD, we present three types of lens functions for UMAP, a state-of-the-art dimensionality reduction algorithm. Lens functions enable analysts to adapt projections to their questions, revealing otherwise hidden patterns. They filter the modelled connectivity to explore the interaction between manually selected features and the data's structure, creating configurable perspectives each potentially revealing new insights. The effectiveness of the lens functions is demonstrated in two use cases and their computational cost is analysed in a synthetic benchmark. Our implementation is available in an open-source Python package:
Machine Learning,Computational Geometry,Human-Computer Interaction
What problem does this paper attempt to address?
The problem that this paper attempts to solve is in the dimensionality - reduction visualization of high - dimensional data, how to use domain knowledge to enhance or suppress the visibility of specific patterns so that analysts can discover hidden patterns according to their own research questions. Specifically, the paper proposes three "lens functions" for UMAP (an advanced dimensionality - reduction algorithm). These functions can adjust the data projection based on the domain knowledge provided by analysts, thereby revealing patterns that might otherwise be overlooked. In this way, the paper aims to improve the interactivity and exploratory nature of data analysis, enabling analysts to explore data from multiple perspectives and gain different insights. The main contributions of the paper include: 1. Proposing three types of lens functions for the UMAP model. These functions can adjust the embedding results according to domain knowledge to answer specific questions. 2. Demonstrating the workflow of exploration using these lens functions through two case studies and explaining in which scenarios each type of lens is most applicable. 3. Developing an open - source Python package that implements the proposed functions and demonstrated case studies, facilitating use by other researchers and analysts. These lens functions provide more flexibility and insight in the dimensionality - reduction visualization of high - dimensional data by changing the connectivity in the model, allowing analysts to emphasize or explore patterns in the data according to specific features of interest or additional signals.