Abstract:With the advent of large language models (LLMs), there is a growing interest in applying LLMs to scientific tasks. In this work, we conduct an experimental study to explore applicability of LLMs for configuring, annotating, translating, explaining, and generating scientific workflows. We use 5 different workflow specific experiments and evaluate several open- and closed-source language models using state-of-the-art workflow systems. Our studies reveal that LLMs often struggle with workflow related tasks due to their lack of knowledge of scientific workflows. We further observe that the performance of LLMs varies across experiments and workflow systems. Our findings can help workflow developers and users in understanding LLMs capabilities in scientific workflows, and motivate further research applying LLMs to workflows.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to evaluate the applicability of large language models (LLMs) in scientific workflows. Specifically, the author explores and evaluates the performance of LLMs in configuring, annotating, translating, interpreting, and generating scientific workflows through a series of experiments. ### Core Problems of the Paper 1. **Complexity of Scientific Workflows**: - Scientific workflows usually involve multiple inter - related tasks and have extensive data and computing requirements. - Current LLMs may lack in - depth understanding of scientific workflows, resulting in poor performance when handling these tasks. 2. **Difficulty in Using Scientific Workflow Systems**: - Although scientific workflow systems can simplify task management and data exchange, many scientists find these systems difficult to use and often choose to run tasks manually or develop their own solutions. - LLMs have the potential to help solve these problems, but their capabilities need in - depth research and evaluation. 3. **Limitations of Existing Research**: - Previous research has mainly focused on specific high - performance computing (HPC) - related tasks, such as code generation, annotation, answering queries, etc. - There is a lack of comprehensive research on the wide application of LLMs in complete workflow systems. ### Research Objectives - **Evaluate the Capabilities of LLMs**: Through multiple experiments, evaluate the performance of different LLMs in scientific workflows, including configuring, annotating, translating, interpreting, and generating workflows. - **Reveal the Advantages and Limitations of LLMs**: Identify the strengths and weaknesses of LLMs in handling scientific workflow tasks. - **Promote Further Research**: Provide understanding for workflow developers and users regarding the application of LLMs in scientific workflows and stimulate more research on applying LLMs to scientific workflows. ### Experimental Setup The author selected five different experiments to evaluate the performance of LLMs in scientific workflows: 1. **Workflow Configuration**: Research on the ability of LLMs to generate workflow configuration scripts. 2. **Task Code Annotation**: Evaluate the ability of LLMs to automatically annotate user task codes. 3. **Task Code Translation**: Test the ability of LLMs to translate task codes between different workflow systems. 4. **Workflow Interpretation**: Evaluate the ability of LLMs to understand and interpret scientific workflows. 5. **Develop Mini - Applications**: Require LLMs to develop workflow benchmarking programs that combine HPC and AI tasks. ### Conclusions Through these experiments, the author hopes to provide an in - depth understanding of the performance of LLMs in scientific workflows and point out their advantages and limitations, thereby providing guidance for future research and applications. --- In summary, this paper aims to evaluate the applicability of large language models in scientific workflows through empirical research, reveal their potential advantages and limitations, and hope to provide valuable insights for the development and application of scientific workflows.

Do Large Language Models Speak Scientific Workflows?

Large Language Models to the Rescue: Reducing the Complexity in Scientific Workflow Development Using ChatGPT

An Interdisciplinary Outlook on Large Language Models for Scientific Research

How should the advent of large language models affect the practice of science?

Towards Efficient Large Language Models for Scientific Text: A Review

Large Language Models for Constructing and Optimizing Machine Learning Workflows: A Survey

Scientific Large Language Models: A Survey on Biological & Chemical Domains

A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

A qualitative assessment of using ChatGPT as large language model for scientific workflow development

Large language models for science and medicine

LLMs for science: Usage for code generation and data analysis

Scientific Computing with Large Language Models

WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models

Evaluating Large Language Models for Structured Science Summarization in the Open Research Knowledge Graph

On the effectiveness of Large Language Models for GitHub Workflows

Materials science in the era of large language models: a perspective

Using large language models to create narrative events

Large Language Models as Evaluators for Scientific Synthesis

Large Language Models, scientific knowledge and factuality: A framework to streamline human expert evaluation

Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field

Practical Applications of Large Language Models for Health Care Professionals and Scientists