Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

Cheng-Yu Hsieh,Si-An Chen,Chun-Liang Li,Yasuhisa Fujii,Alexander Ratner,Chen-Yu Lee,Ranjay Krishna,Tomas Pfister

2023-08-02

Abstract:Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage. Unfortunately, demonstrations are hard to acquire, and can result in undesirable biased usage if the wrong demonstration is chosen. Even in the rare scenario that demonstrations are readily available, there is no principled selection protocol to determine how many and which ones to provide. As tasks grow more complex, the selection search grows combinatorially and invariably becomes intractable. Our work provides an alternative to demonstrations: tool documentation. We advocate the use of tool documentation, descriptions for the individual tool usage, over demonstrations. We substantiate our claim through three main empirical findings on 6 tasks across both vision and language modalities. First, on existing benchmarks, zero-shot prompts with only tool documentation are sufficient for eliciting proper tool usage, achieving performance on par with few-shot prompts. Second, on a newly collected realistic tool-use dataset with hundreds of available tool APIs, we show that tool documentation is significantly more valuable than demonstrations, with zero-shot documentation significantly outperforming few-shot without documentation. Third, we highlight the benefits of tool documentations by tackling image generation and video tracking using just-released unseen state-of-the-art models as tools. Finally, we highlight the possibility of using tool documentation to automatically enable new applications: by using nothing more than the documentation of GroundingDino, Stable Diffusion, XMem, and SAM, LLMs can re-invent the functionalities of the just-released Grounded-SAM and Track Anything models.

Computation and Language,Artificial Intelligence,Computer Vision and Pattern Recognition,Machine Learning

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to reduce the dependence on example demonstrations (demos) and instead utilize tool documents (docs) when large - language models (LLMs) use new tools. Currently, LLMs learn how to use new tools by providing a small number of examples of using tools, but this method has several problems: it is difficult to obtain high - quality example demonstrations, and improper selection may lead to a decline in model performance or bias. In addition, when the task becomes complex, it becomes extremely difficult to select the appropriate number and content of example demonstrations. Therefore, the paper proposes an alternative - using tool documents to guide LLMs on how to use new tools, thereby achieving zero - shot tool use. In this way, not only can the dependence on example demonstrations be reduced, but also the ability of LLMs to support a large number of tools can be more effectively expanded. The paper proves through experiments that using only tool documents, the performance of LLMs on a variety of tasks can be comparable to or even better than that using a small number of example demonstrations. This shows that tool documents can effectively reduce the need for example demonstrations and improve the adaptability and efficiency of LLMs when facing new tools. At the same time, the paper also shows that LLMs can seamlessly integrate new visual and video processing tools by reading documents to solve previously unseen tasks, such as image editing and video tracking, further proving the effectiveness and potential of this method.

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions

EASYTOOL: Enhancing LLM-based Agents with Concise Tool Instruction

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction

Efficient and Scalable Estimation of Tool Representations in Vector Space

ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

Tool Learning with Large Language Models: A Survey

Creative Robot Tool Use with Large Language Models

MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use

MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation

ToolGen: Unified Tool Retrieval and Calling via Generation

Towards Autonomous Tool Utilization in Language Models: A Unified, Efficient and Scalable Framework

Toolformer: Language Models Can Teach Themselves to Use Tools

Towards Completeness-Oriented Tool Retrieval for Large Language Models

T-Eval: Evaluating the Tool Utilization Capability of Large Language Models Step by Step

TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use

TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

Towards Tool Use Alignment of Large Language Models

Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios