Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval

Yanfei Chen,Jinsung Yoon,Devendra Singh Sachan,Qingze Wang,Vincent Cohen-Addad,Mohammadhossein Bateni,Chen-Yu Lee,Tomas Pfister
2024-09-21
Abstract:Recent advances in large language models (LLMs) have enabled autonomous agents with complex reasoning and task-fulfillment capabilities using a wide range of tools. However, effectively identifying the most relevant tools for a given task becomes a key bottleneck as the toolset size grows, hindering reliable tool utilization. To address this, we introduce Re-Invoke, an unsupervised tool retrieval method designed to scale effectively to large toolsets without training. Specifically, we first generate a diverse set of synthetic queries that comprehensively cover different aspects of the query space associated with each tool document during the tool indexing phase. Second, we leverage LLM's query understanding capabilities to extract key tool-related context and underlying intents from user queries during the inference phase. Finally, we employ a novel multi-view similarity ranking strategy based on intents to pinpoint the most relevant tools for each query. Our evaluation demonstrates that Re-Invoke significantly outperforms state-of-the-art alternatives in both single-tool and multi-tool scenarios, all within a fully unsupervised setting. Notably, on the ToolE datasets, we achieve a 20% relative improvement in nDCG@5 for single-tool retrieval and a 39% improvement for multi-tool retrieval.
Computation and Language
What problem does this paper attempt to address?
The paper aims to address the issue of large language models (LLMs) struggling to effectively identify the most relevant tools when faced with a large number of tools. Specifically, as the size of the toolset grows, finding the most suitable tool for a specific task becomes a critical bottleneck, limiting the reliability of tool usage. To solve this problem, the research team proposed Re-Invoke, an unsupervised tool retrieval method that can effectively scale to large toolsets without training. Re-Invoke achieves its goal through the following steps: 1. **Query Generator**: During the tool indexing phase, it automatically generates a set of diverse synthetic queries to comprehensively cover different aspects of the query space related to each tool document. 2. **Intent Extractor**: During the inference phase, it leverages the understanding capabilities of LLMs to extract key tool-related context and latent intent from user queries. 3. **Multi-View Similarity Ranking Strategy**: Based on the extracted intent, it employs a novel multi-view similarity ranking strategy to locate the most relevant tools for each query. Experimental results show that Re-Invoke significantly outperforms existing state-of-the-art alternatives in both single-tool and multi-tool scenarios, achieving a relative 20% improvement in nDCG@5 (single-tool retrieval) and 39% improvement (multi-tool retrieval) on the ToolE dataset, and performs excellently in a completely unsupervised setting. Additionally, the paper demonstrates Re-Invoke's superior performance in end-to-end performance evaluations, further validating its effectiveness in practical applications.