Systematic analysis on the horse-shoe-like effect in PCA plots of scRNA-seq data

Najeebullah Shah,Qiuchen Meng,Ziheng Zou,Xuegong Zhang
DOI: https://doi.org/10.1093/bioadv/vbae109
2024-07-29
Bioinformatics Advances
Abstract:Abstract In single cell studies, Principal Component Analysis (PCA) is widely used to reduce the dimensionality of dataset and visualize in 2D or 3D PC plots. Scientists often focus on different clusters within PC plot, overlooking the specific phenomenon, such as horse-shoe-like effect, that may reveal hidden knowledge about underlying biological dataset. This phenomenon remains largely unexplored in single cell studies. In this study, we investigated into the horse-shoe-like effect in PC plots using simulated and real scRNA-seq datasets. We systematically explain horse-shoe-like phenomenon from various inter-related perspectives. Initially, we establish an intuitive understanding with the help of simulated datasets. Then, we generalized the acquired knowledge on real biological scRNA-seq data. Experimental results provide logical explanations and understanding for the appearance of horse-shoe-like effect in PC plots. Furthermore, we identify a potential problem with a well-known theory of ‘distance saturation property’ attributed to induce horse-shoe phenomenon. Finally, we analyze a mathematical model for horse-shoe effect that suggests trigonometric solutions to estimated eigenvectors. We observe significant resemblance after comparing the results of mathematical model with simulated and real scRNA-seq datasets.
What problem does this paper attempt to address?