pyVIPER: A fast and scalable Python package for rank-based enrichment analysis of single-cell RNASeq data

Alexander L.E. Wang,Zizhao Lin,Luca Zanella,Lukas Vlahos,Miquel Anglada-Girotto,Aziz Zafar,Heeju Noh,Andrea Califano,Alessandro Vasciaveo
DOI: https://doi.org/10.1101/2024.08.25.609585
2024-08-27
Abstract:Summary: Single-cell sequencing has revolutionized biomedical research by offering insights into cellular heterogeneity at unprecedented resolution. Yet, the low signal-to-noise ratio, characteristic of single-cell RNA sequencing (scRNASeq), challenges quantitative analyses. We have shown that gene regulatory network (GRN) analysis can help overcome this obstacle and support mechanistic elucidation of cellular state determinants, for example by using the VIPER algorithm to identify Master Regulator (MR) proteins from gene expression data. A key challenge, as the size and complexity of scRNASeq datasets grow, is the need for highly scalable tools supporting the analysis of large-scale datasets with up to hundreds of thousands of cells. To address it, we introduce pyVIPER, a fast, memory-efficient, and highly scalable Python toolkit for assessing protein activity in large-scale scRNASeq datasets. pyVIPER supports multiple enrichment analysis algorithms, data transformation/postprocessing modules, a novel data structure for GRNs manipulation, and seamless integration with AnnData, Scanpy and several widely adopted machine learning libraries. Compared to VIPER, benchmarking reveals orders of magnitude runtime reduction for large datasets,i.e., from hours to minutes,thus supporting VIPER-based analysis of virtually any large-scale single-cell dataset, as well as integration with other Python-based tools.
Bioinformatics
What problem does this paper attempt to address?