p-ClustVal: A Novel p-adic Approach for Enhanced Clustering of High-Dimensional scRNASeq Data

Parichit Sharma,Sarthak Mishra,Hasan Kurban,Mehmet Dalkilic
DOI: https://doi.org/10.1101/2024.10.18.619153
2024-10-22
Abstract:This paper introduces p-ClustVal, a novel data transformation technique inspired by p-adic number theory that significantly enhances cluster discernibility in genomics data, specifically Single Cell RNA Sequencing (scRNASeq). By leveraging p-adic Valuation, p-ClustVal integrates with and augments widely used clustering algorithms and dimension reduction techniques, amplifying their effectiveness in discovering meaningful structure from data. The transformation uses a data-centric heuristic to determine optimal parameters, without relying on ground truth labels, making it more user-friendly. p-ClustVal reduces overlap between clusters by employing alternate metric spaces inspired by p-adic Valuation, a significant shift from conventional methods. Our comprehensive evaluation spanning 30 experiments and over 1200 observations, shows that p-ClustVal improves performance in 91\% of cases, and boosts the performance of classical and state of the art (SOTA) methods. This work contributes to data analytics and genomics by introducing a unique data transformation approach, enhancing downstream clustering algorithms, and providing empirical evidence of p-ClustVal's efficacy. The study concludes with insights into the limitations of p-ClustVal and future research directions
Bioinformatics
What problem does this paper attempt to address?