Abstract:Recent development in Artificial Intelligence (AI) models has propelled their application in scientific discovery, but the validation and exploration of these discoveries require subsequent empirical experimentation. The concept of self-driving laboratories promises to automate and thus boost the experimental process following AI-driven discoveries. However, the transition of experimental protocols, originally crafted for human comprehension, into formats interpretable by machines presents significant challenges, which, within the context of specific expert domain, encompass the necessity for structured as opposed to natural language, the imperative for explicit rather than tacit knowledge, and the preservation of causality and consistency throughout protocol steps. Presently, the task of protocol translation predominantly requires the manual and labor-intensive involvement of domain experts and information technology specialists, rendering the process time-intensive. To address these issues, we propose a framework that automates the protocol translation process through a three-stage workflow, which incrementally constructs Protocol Dependence Graphs (PDGs) that approach structured on the syntax level, completed on the semantics level, and linked on the execution level. Quantitative and qualitative evaluations have demonstrated its performance at par with that of human experts, underscoring its potential to significantly expedite and democratize the process of scientific discovery by elevating the automation capabilities within self-driving laboratories.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to convert experimental protocols from human - readable natural language forms into a structured format that can be understood and executed by automated systems (such as self - driving laboratories). Specifically, the paper mainly focuses on the following three key challenges: 1. **Grammar level**: - **Mapping of operating conditions**: In experimental protocols written in natural language, operations and their corresponding parameters (such as input reagents and conditions) are usually intertwined. For example, "Dissolve 10 grams of sodium chloride in 100 milliliters of distilled water at 80 °C". Such a description is very complex for machine parsing. In order for machines to understand these protocols, operations and conditions need to be precisely separated and represented. - **Control flow of operations**: Linear and non - linear control flows in natural language are usually implicit in the text. For example, "Repeat the titration until the end point is reached, and then record the volume of the titrant used". This iterative process is very difficult for machines to interpret correctly. 2. **Semantic level**: - **Potential semantics of known - unknowns**: The default values of some parameters are considered common knowledge among domain experts, so specific numerical values are omitted in the protocol or domain - convention substitutes are used. For example, "Dry the purified product at room temperature", where "room temperature" usually refers to 20 - 25 °C, but machines cannot directly understand this. All parameter values need to be made explicit. - **Potential semantics of unknown - unknowns**: Sometimes, key parameters of certain operations are unintentionally or intentionally omitted in the protocol, resulting in information loss. For example, "Centrifuge the sample after adding the enzyme", where the speed or time of centrifugation is not specified. This is a problem for machines because they need all parameters for each operation to be explicit. 3. **Execution level**: - **Resource capacity**: Explicit statements about resource capacity are often omitted in protocols, which may lead to execution errors, such as exceeding the maximum capacity of the device. For example, "Transfer the mixture to a beaker", and a beaker large enough to hold the accumulated volume needs to be selected. - **Operation safety**: In addition to managing resource capacity, runtime errors may also result from certain operations that are semantically valid but may lead to adverse or dangerous results in a specific execution context. For example, "Heat the reaction mixture to 70 °C". Depending on the composition of the mixture, this operation may be safe or dangerous. To solve these problems, the paper proposes a framework that automatically performs protocol translation through a three - stage workflow, gradually constructing Protocol Dependence Graphs (PDGs). This framework aims to address the challenges at the above three levels, thereby significantly accelerating and democratizing the process of scientific discovery and improving the automation capabilities of self - driving laboratories. ### Main contributions of the framework 1. **Systematic analysis of existing differences**: The paper conducts a systematic analysis of the differences in protocol translation between human experimenters and automated systems, and derives design principles from it to simulate protocol translation in the human cognitive process. 2. **Autonomous protocol translator**: An autonomous protocol translator is proposed that gradually constructs PDGs through a three - stage framework, covering grammar, semantics, and execution levels. 3. **Performance evaluation**: Through quantitative and qualitative evaluations, it is proven that the translator performs close to skilled human experimenters in various experimental science fields and is significantly better than alternatives based solely on large language models (LLMs) in protocol translation tasks. In summary, this paper aims to promote the development of self - driving laboratories and accelerate the process of scientific research by reducing the need for human intervention through automated protocol translation.

Expert-level protocol translation for self-driving labs

Towards Automatically Reverse Engineering Vehicle Diagnostic Protocols

Transcription Between Human-Readable Synthetic Descriptions and Machine-Executable Instructions: an Application of the Latest Pre-Training Technology

The future of self-driving laboratories: From Human in the Loop Interactive AI to Gamification

Autonomous chemical science and engineering enabled by self-driving laboratories

A dynamic knowledge graph approach to distributed self-driving laboratories

Synergizing Human Expertise and AI Efficiency with Language Model for Microscopy Operation and Automated Experiment Design

DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models

A multi-agent-driven robotic AI chemist enabling autonomous chemical research on demand

ChatGPT As Your Vehicle Co-Pilot: an Initial Attempt.

Data-Centric Architecture for Self-Driving Laboratories with Autonomous Discovery of New Nanomaterials

The rise of self-driving labs in chemical and materials sciences

AI-chemist for Chemistry Synthesis, Property Characterization, and Performance Testing

An all-round AI-Chemist with a scientific mind

Exploring Human-Like Translation Strategy with Large Language Models

hmCodeTrans: Human-Machine Interactive Code Translation

Collaborative Intelligence in Sequential Experiments: A Human-in-the-Loop Framework for Drug Discovery

Equipping data-driven experiment planning for Self-driving Laboratories with semantic memory: case studies of transfer learning in chemical reaction optimization

Personalized Autonomous Driving with Large Language Models: Field Experiments

Domain Knowledge Distillation from Large Language Model: An Empirical Study in the Autonomous Driving Domain

Automating Exploratory Proteomics Research via Language Models