anndata: Annotated data

Isaac Virshup,Sergei Rybakov,Fabian J. Theis,Philipp Angerer,F. Alexander Wolf
DOI: https://doi.org/10.1101/2021.12.16.473007
2021-01-01
Abstract:anndata is a Python package for handling annotated data matrices in memory and on disk ([github.com/theislab/anndata][1]), positioned between pandas and xarray. anndata offers a broad range of computationally efficient features including, among others, sparse data support, lazy operations, and a PyTorch interface. Statement of need Generating insight from high-dimensional data matrices typically works through training models that annotate observations and variables via low-dimensional representations. In exploratory data analysis, this involves iterative training and analysis using original and learned annotations and task-associated representations. anndata offers a canonical data structure for book-keeping these, which is neither addressed by pandas ([McKinney, 2010][2]), nor xarray ([Hoyer & Hamman, 2017][3]), nor commonly-used modeling packages like scikit-learn ([Pedregosa et al., 2011][4]). ### Competing Interest Statement F.J.T. consults for Immunai Inc., Singularity Bio B.V., CytoReason Ltd, and Omniscope Ltd, and has ownership interest in Cellarity Inc. and Dermagnostix GmbH. P.A. and A.W. are full-time employees of Cellarity Inc., and have ownership interest in Cellarity Inc.. [1]: http://github.com/theislab/anndata [2]: #ref-13 [3]: #ref-9 [4]: #ref-19
What problem does this paper attempt to address?