GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution

Yining Lu,Haoping Yu,Daniel Khashabi
DOI: https://doi.org/10.48550/arXiv.2307.08775
2024-01-31
Abstract:Augmenting large language models (LLM) to use external tools enhances their performance across a variety of tasks. However, prior works over-rely on task-specific demonstration of tool use that limits their generalizability and computational cost due to making many calls to large-scale LLMs. We introduce GEAR, a computationally efficient query-tool grounding algorithm that is generalizable to various tasks that require tool use while not relying on task-specific demonstrations. GEAR achieves better efficiency by delegating tool grounding and execution to small language models (SLM) and LLM, respectively; while leveraging semantic and pattern-based evaluation at both question and answer levels for generalizable tool grounding. We evaluate GEAR on 14 datasets across 6 downstream tasks, demonstrating its strong generalizability to novel tasks, tools and different SLMs. Despite offering more efficiency, GEAR achieves higher precision in tool grounding compared to prior strategies using LLM prompting, thus improving downstream accuracy at a reduced computational cost. For example, we demonstrate that GEAR-augmented GPT-J and GPT-3 outperform counterpart tool-augmented baselines because of better tool use.
Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve two main problems that large - language models (LLMs) have when using external tools: 1. **Dependence on task - specific demonstrations**: Previous work overly depends on task - specific tool - use demonstrations, which limits their generalization ability and computational cost because it requires frequent invocation of large - scale LLMs. 2. **Low computational efficiency**: Existing methods require a large number of invocations of large - language models during tool selection and execution, resulting in high computational costs and low efficiency. To overcome these problems, the authors propose **GEAR** (Generalizable and Efficient Tool Resolution), a computationally efficient query - tool alignment algorithm that can generalize to various tasks without task - specific demonstrations. The main features of GEAR include: - **Efficiency**: By delegating tool alignment and execution to small - language models (SLMs) and large - language models (LLMs) respectively, the computational cost is reduced. - **Generalization ability**: By using semantic and pattern evaluation to perform general tool alignment at the question and answer levels, the generalization ability to new tasks, new tools, and different SLMs is improved. - **Accuracy**: In terms of tool alignment, GEAR has higher precision than existing LLM - prompt - based methods, thereby improving the accuracy of downstream tasks. ### Main contributions 1. **Proposed a new query - tool alignment algorithm**: GEAR selects the most appropriate tool by combining semantic similarity and pattern similarity, improving the accuracy and generalization ability of tool alignment. 2. **Improved computational efficiency**: By assigning most of the computational tasks to small - language models, the number of invocations of large - language models is reduced, significantly reducing the computational cost. 3. **Extensive experimental verification**: Experiments were carried out on 14 datasets, covering 6 downstream tasks, demonstrating GEAR's strong generalization ability on new tasks, new tools, and different SLMs. ### Experimental results - **Downstream task performance**: In a tool library containing 4 basic tools, GEAR outperforms all baseline models on four basic tasks. For example, in the open - domain question - answering task (ODQA), the accuracy of GPT - J enhanced by GEAR is 24.3% and 6.7% higher than that of the zero - sample and few - sample baselines. - **Tool - alignment accuracy**: In a tool library expanded to 10 tools, GEAR performs excellently in terms of tool - alignment accuracy, especially in arithmetic and machine - translation tasks. For more open natural - language - processing tasks, such as open - domain question - answering and common - sense question - answering, GEAR's alignment strategy also shows stronger generalization ability. ### Conclusion GEAR significantly improves the performance and computational efficiency of large - language models when using external tools through an efficient and generalized query - tool alignment algorithm, providing a new direction for future research.