Fast and interpretable non-negative matrix factorization for atlas-scale single cell data

Zachary J DeBruine,John Andrew Pospisilik,PERMUTE Consortium,Timothy J. Triche Jr.
DOI: https://doi.org/10.1101/2021.09.01.458620
2024-05-14
Abstract:Non-negative matrix factorization (NMF) is a popular method for analyzing strictly positive data due to its relatively straightforward interpretation. However, NMF has a reputation as a less efficient alternative to the singular value decomposition (SVD), a standard operation that is highly optimized in most linear algebra libraries. Sparse single-cell sequencing assays, now feasible in thousands of subjects and millions of cells, generate data matrices with tens of thousands of strictly non-negative transcript abundance entries. We present an extremely fast NMF implementation made available in the RcppML (Rcpp Machine Learning library) R package that rivals the runtimes of state-of-the-art Singular Value Decomposition (SVD). NMF can now be run quickly on desktop computers to analyze sparse single-cell datasets consisting of hundreds of thousands of cells. Our method improves upon current NMF implementations by introducing a scaling diagonal to increase interpretability, guarantee consistent regularization penalties across different random initializations, and symmetry in symmetric factorizations. We use our method to show how NMF models learned on standard log-normalized count data are interpretable dimensional reductions, describe interpretable patterns of coordinated gene activities, and explain biologically relevant metadata. We believe NMF has the potential to replace PCA in most single-cell analyses, and the presented NMF implementation overcomes previous challenges with long runtime.
Bioinformatics
What problem does this paper attempt to address?