Parsl+CWL: Towards Combining the Python and CWL Ecosystems

Nishchay Karle,Ben Clifford,Yadu Babuji,Ryan Chard,Daniel S. Katz,Kyle Chard
2024-12-11
Abstract:The Common Workflow Language (CWL) is a widely adopted language for defining and sharing computational workflows. It is designed to be independent of the execution engine on which workflows are executed. In this paper, we describe our experiences integrating CWL with Parsl, a Python-based parallel programming library designed to manage execution of workflows across diverse computing environments. We propose a new method that converts CWL CommandLineTool definitions into Parsl apps, enabling Parsl scripts to easily import and use tools represented in CWL. We describe a Parsl runner that is capable of executing a CWL CommandLineTool directly. We also describe a proof-of-concept extension to support inline Python in a CWL workflow definition, enabling seamless use in the Python ecosystem of Parsl. We demonstrate the benefits of this integration by presenting example CWL CommandLineTool definitions that show how they can be used in Parsl, and comparing performance of executing an image processing workflow using the Parsl integration and other CWL runners.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problems of interoperability and portability of scientific workflows, especially in the combination of the Python ecosystem and Common Workflow Language (CWL). Specifically, the paper explores how to integrate CWL with Parsl to achieve the following goals: 1. **Improve interoperability**: As a general - purpose workflow description language, CWL can share and reuse tool definitions among different workflow engines. By directly importing CWL tools into Parsl, researchers can execute these tools on different platforms without having to rewrite the tool definitions. 2. **Enhance portability**: CWL provides a standard way to describe tools, ensuring that they can be executed on different computing platforms. By integrating CWL tools into Parsl, users can take advantage of Parsl's flexibility and performance advantages to scale workflows from personal computers to high - performance computing clusters. 3. **Increase development efficiency**: Python is a commonly used language in scientific research. By directly integrating CWL tools into Parsl, users can seamlessly use these tools in the Python environment, simplifying the writing and management of workflows. 4. **Support dynamic logic**: CWL allows embedding JavaScript expressions in workflow definitions to handle dynamic logic. In order to better adapt to Parsl's Python environment, the paper proposes a new extension that supports embedding Python expressions in CWL, thus allowing more flexible handling of dynamic decisions in workflows. 5. **Optimize performance and scalability**: Parsl's parallel programming model and efficient execution framework enable it to manage and optimize resource allocation, thereby improving the execution efficiency and scalability of workflows. By integrating CWL tools into Parsl, users can fully utilize these advantages. ### Main contributions of the paper - **CWLApp**: A new Parsl application type, `CWLApp`, is introduced, which can read CWL CommandLineTool definitions and transparently create a Parsl BashApp to execute these tools. - **Parsl CWL Runner**: The function of Parsl as a CWL runner is implemented, which can directly execute CWL CommandLineTool definitions. - **Inline Python Expressions**: A new CWL extension is proposed, which allows embedding Python expressions in CWL workflow definitions, thus better matching Parsl's execution environment. - **Performance comparison**: By comparing the performance of executing image - processing workflows using Parsl integration and other CWL runners, the advantages of Parsl integration are demonstrated. In summary, by integrating CWL with Parsl, this paper solves the challenges of scientific workflows in terms of interoperability, portability, and performance optimization, providing researchers with a more powerful and flexible workflow management tool.