A Processing and Analytics System for Microscopy Data Workflows: The Pycroscopy Ecosystem of Packages

Rama Krishnan Vasudevan,Sai Mani Valleti,Maxim Ziatdinov,Gerd Duscher,Suhas Somnath
DOI: https://doi.org/10.1002/adts.202300247
2023-09-21
Advanced Theory and Simulations
Abstract:The pycroscopy ecosystem of packages enables ingestion, machine learning and statistical analysis, processing, and visualization of a large variety of microscopy data from many different vendors in a reproducible and accessible fashion. This ecosystem enables rapid generation of analytical workflows that can form the basis of autonomous laboratories leveraging experiments with computation and simulations for physical discovery. Major advancements in fields as diverse as biology and quantum computing have relied on a multitude of microscopy techniques. Despite the considerable proliferation of these instruments, significant bottlenecks remain in terms of processing, analysis, storage, and retrieval of the acquired datasets. Aside from lack of file standards, individual domain‐specific analysis packages are often disjoint from the underlying datasets, and thus keeping track of analysis and processing steps remains tedious for the end‐user, hampering reproducibility. Here, the pycroscopy ecosystem of packages is introduced, an open‐source python‐based ecosystem underpinned by a common data model. The data model, termed the N‐dimensional spectral imaging data format, is realized in pycroscopy's sidpy package. This package is built on top of dask arrays, thus leveraging dask array attributes, but expanding them to accelerate microscopy relevant analysis and visualization. Several examples of the use of the pycroscopy ecosystem to create workflows for data ingestion and analysis of scanning transmission electron microscopy (STEM) and scanning probe microscopy data are shown. Adoption of such standardized routines will be critical to usher in the next generation of autonomous instruments where processing, computation, and meta‐data storage will be critical to overall experimental operations.
multidisciplinary sciences
What problem does this paper attempt to address?