Jup2Kub: algorithms and a system to translate a Jupyter Notebook pipeline to a fault tolerant distributed Kubernetes deployment

Jinli Duan,Shasha Dennis
2023-11-21
Abstract:Scientific workflows facilitate computational, data manipulation, and sometimes visualization steps for scientific data analysis. They are vital for reproducing and validating experiments, usually involving computational steps in scientific simulations and data analysis. These workflows are often developed by domain scientists using Jupyter notebooks, which are convenient yet face limitations: they struggle to scale with larger data sets, lack failure tolerance, and depend heavily on the stability of underlying tools and packages. To address these issues, Jup2Kup has been developed. This software system translates workflows from Jupyter notebooks into a distributed, high-performance Kubernetes environment, enhancing fault tolerance. It also manages software dependencies to maintain operational stability amidst changes in tools and packages.
Distributed, Parallel, and Cluster Computing,Software Engineering
What problem does this paper attempt to address?