Abstract:Probabilistic models have provided the underpinnings for state-of-the-art performance in many single-cell omics data analysis tasks, including dimensionality reduction, clustering, differential expression, annotation, removal of unwanted variation, and integration across modalities. Many of the models being deployed are amenable to scalable stochastic inference techniques, and accordingly they are able to process single-cell datasets of realistic and growing sizes. However, the community-wide adoption of probabilistic approaches is hindered by a fractured software ecosystem resulting in an array of packages with distinct, and often complex interfaces. To address this issue, we developed scvi-tools (), a Python package that implements a variety of leading probabilistic methods. These methods, which cover many fundamental analysis tasks, are accessible through a standardized, easy-to-use interface with direct links to Scanpy, Seurat, and Bioconductor workflows. By standardizing the implementations, we were able to develop and reuse novel functionalities across different models, such as support for complex study designs through nonlinear removal of unwanted variation due to multiple covariates and reference-query integration via scArches. The extensible software building blocks that underlie scvi-tools also enable a developer environment in which new probabilistic models for single cell omics can be efficiently developed, benchmarked, and deployed. We demonstrate this through a code-efficient reimplementation of Stereoscope for deconvolution of spatial transcriptomics profiles. By catering to both the end user and developer audiences, we expect scvi-tools to become an essential software dependency and serve to formulate a community standard for probabilistic modeling of single cell omics. ### Competing Interest Statement O.C. is supported by the EPSRC Centre for Doctoral Training in Modern Statistics and Statistical Machine Learning (EP/S023151/1) and Novo Nordisk. V.S. is a full-time employee of Serqet Therapuetics and has ownership interest in Serqet Therapeutics. F.J.T. reports receiving consulting fees from Roche Diagnostics GmbH and Cellarity Inc., and ownership interest in Cellarity, Inc.

SEK: sparsity exploiting k-mer-based estimation of bacterial community composition

Multi-sample Estimation of Bacterial Composition Matrix in Metagenomics Data

Robust Covariance Estimation for High-dimensional Compositional Data with Application to Microbial Communities Analysis

Sparse and compositionally robust inference of microbial ecological networks

Ensemble Analysis of Adaptive Compressed Genome Sequencing Strategies

Bacterial Community Reconstruction Using A Single Sequencing Reaction

Scvi-Tools: a Library for Deep Probabilistic Analysis of Single-Cell Omics Data

Sparse distance-based learning for simultaneous multiclass classification and feature selection of metagenomic data

Finer Metagenomic Reconstruction via Biodiversity Optimization

A Novel Slope-Matrix-Graph Algorithm to Analyze Compositional Microbiome Data

A Bayesian approach to inferring the phylogenetic structure of communities from metagenomic data

A New Approach for Scalable Analysis of Microbial Communities

Beyond Asymptotics: Practical Insights into Community Detection in Complex Networks

Accurate Profiling of Microbial Communities from Massively Parallel Sequencing using Convex Optimization

Covariance Matrix Estimation for High-Throughput Biomedical Data with Interconnected Communities

Distilled Single Cell Genome Sequencing and De Novo Assembly for Sparse Microbial Communities

Real-time Taxonomic Characterization of Long-read Mixed-species Sequencing Samples in Sorted Motif Distance Space:

A semi-parametric multiple imputation method for high-sparse, high-dimensional, compositional data

A rapid phylogeny-based method for accurate community profiling of large-scale metabarcoding datasets

Analysis and correction of compositional bias in sparse sequencing count data

SparseCodePicking: feature extraction in mass spectrometry using sparse coding algorithms