Abstract:Recent advancements in function calling and tool use have significantly enhanced the capabilities of large language models (LLMs) by enabling them to interact with external information sources and execute complex tasks. However, the limited context window of LLMs presents challenges when a large number of tools are available, necessitating efficient methods to manage prompt length and maintain accuracy. Existing approaches, such as fine-tuning LLMs or leveraging their reasoning capabilities, either require frequent retraining or incur significant latency overhead. A more efficient solution involves training smaller models to retrieve the most relevant tools for a given query, although this requires high quality, domain-specific data. To address those challenges, we present a novel framework for generating synthetic data for tool retrieval applications and an efficient data-driven tool retrieval strategy using small encoder models. Empowered by LLMs, we create ToolBank, a new tool retrieval dataset that reflects real human user usages. For tool retrieval methodologies, we propose novel approaches: (1) Tool2Vec: usage-driven tool embedding generation for tool retrieval, (2) ToolRefiner: a staged retrieval method that iteratively improves the quality of retrieved tools, and (3) MLC: framing tool retrieval as a multi-label classification problem. With these new methods, we achieve improvements of up to 27.28 in Recall@K on the ToolBench dataset and 30.5 in Recall@K on ToolBank. Additionally, we present further experimental results to rigorously validate our methods. Our code is available at \url{<a class="link-external link-https" href="https://github.com/SqueezeAILab/Tool2Vec" rel="external noopener nofollow">this https URL</a>}

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to retrieve relevant tools efficiently and accurately in a limited context window when using large - language models (LLMs) for tool invocation. Specifically, when the number of available tools is large, how to effectively manage the prompt length while maintaining accuracy is an important challenge. Existing methods, such as fine - tuning LLMs or using their reasoning ability to select tools, either require frequent retraining or bring significant latency overhead. And the tool - retrieval methods based on descriptions perform poorly because there is an obvious semantic gap between tool descriptions and user queries. For this reason, the paper proposes a use - driven tool - embedding generation method (Tool2Vec) and a two - stage tool - retrieval technique, aiming to improve the efficiency and accuracy of tool retrieval. ### Main Contributions 1. **ToolBank Dataset**: A new high - quality domain - specific tool - retrieval dataset, ToolBank, is constructed, and three new datasets are instantiated within this framework. In the quality evaluation, these datasets achieved a 60% higher winning rate than ToolBench queries through the evaluation of GPT - 4 - turbo. 2. **Tool2Vec**: A use - based tool - embedding generation method is proposed instead of the traditional method relying on tool descriptions. In addition, a two - stage tool - retrieval method is introduced, which gradually improves the quality of retrieved tools through a "retrieve - then - refine" scheme. 3. **Performance Improvement**: On the most difficult ToolBench split, the recall rate of this method is more than 25% higher than that of the ToolBench retriever. On domain - specific datasets, the recall rate of this method is more than 30% higher than that of the description - based retrieval method. ### Method Overview 1. **Tool2Vec**: Tool embeddings are generated by using user queries related to specific tools instead of tool descriptions. This can reduce the distribution gap between queries and tool embeddings and improve the retrieval accuracy. 2. **Multi - Label Classification (MLC)**: The tool - retrieval problem is transformed into a multi - label classification problem, and the model is trained to predict whether each tool is relevant to a given query. 3. **ToolRefiner**: As a refinement tool in the second stage, it further optimizes the candidate tools retrieved in the first stage. ToolRefiner improves the retrieval accuracy by considering tool - query and tool - tool interactions. ### Experimental Results The paper verifies the effectiveness of the proposed method on multiple benchmark datasets, including ToolBench and ToolBank. The experimental results show that the proposed method significantly outperforms existing baseline methods in all metrics, especially in the Recall@K metric. ### Conclusion By introducing Tool2Vec and the two - stage tool - retrieval method, the paper effectively solves the problem of efficiently and accurately retrieving relevant tools among a large number of tools, providing a new solution for tool invocation in large - language - model practical applications.

Efficient and Scalable Estimation of Tool Representations in Vector Space

Improving Tool Retrieval by Leveraging Large Language Models for Query Generation

Towards Completeness-Oriented Tool Retrieval for Large Language Models

Data-Efficient Massive Tool Retrieval: A Reinforcement Learning Approach for Query-Tool Alignment with Language Models

Toolshed: Scale Tool-Equipped Agents with Advanced RAG-Tool Fusion and Tool Knowledge Bases

Enhancing Tool Retrieval with Iterative Feedback from Large Language Models

Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval

Planning, Creation, Usage: Benchmarking LLMs for Comprehensive Tool Utilization in Real-World Complex Scenarios

ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios

MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use

ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval

RE-GAINS & EnChAnT: Intelligent Tool Manipulation Systems For Enhanced Query Responses

ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities

Tool-Planner: Task Planning with Clusters across Multiple Tools

PTR: Precision-Driven Tool Recommendation for Large Language Models

Planning and Editing What You Retrieve for Enhanced Tool Learning

CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets

Green Runner: A tool for efficient deep learning component selection

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph

Online Tool Selection with Learned Grasp Prediction Models