Abstract:Large Language Models (LLMs) have exhibited significant potential in performing diverse tasks, including the ability to call functions or use external tools to enhance their performance. While current research on function calling by LLMs primarily focuses on single-turn interactions, this paper addresses the overlooked necessity for LLMs to engage in multi-turn function calling--critical for handling compositional, real-world queries that require planning with functions but not only use functions. To facilitate this, we introduce an approach, BUTTON, which generates synthetic compositional instruction tuning data via bottom-up instruction construction and top-down trajectory generation. In the bottom-up phase, we generate simple atomic tasks based on real-world scenarios and build compositional tasks using heuristic strategies based on atomic tasks. Corresponding functions are then developed for these compositional tasks. The top-down phase features a multi-agent environment where interactions among simulated humans, assistants, and tools are utilized to gather multi-turn function calling trajectories. This approach ensures task compositionality and allows for effective function and trajectory generation by examining atomic tasks within compositional tasks. We produce a dataset BUTTONInstruct comprising 8k data points and demonstrate its effectiveness through extensive experiments across various LLMs.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the inability of large - language models (LLMs) to handle real - world compound queries that require multi - round function calls. Current research mainly focuses on single - round interactions, that is, how LLMs select appropriate functions and provide correct parameters. However, many practical user queries are complex and cannot be completed in a single step but require planning and execution of multiple function calls. For example, a task such as "Book my first flight from London to Edinburgh" requires first retrieving the flight schedule, finding the first flight, and then booking the ticket. Therefore, the paper focuses on constructing an instruction - fine - tuning dataset, where the input is complex compound queries and the output is the process of decomposing these queries into multi - round function calls, in order to improve the performance of LLMs in multi - round function calls. To achieve this goal, the authors propose a method named BUTTON, which is a "bottom - up and then top - down" pipeline for generating synthetic compound instruction - fine - tuning data. Specifically, this method first generates simple atomic tasks based on real - life scenarios and then uses heuristic strategies to construct compound tasks and their corresponding functions. In the "top - down" stage, a multi - agent environment is set up to simulate the interactions between humans, assistants, and tools, and multi - round function call trajectories are collected. Finally, these collected trajectories and the corresponding function definitions are populated into pre - defined prompt templates as instruction - fine - tuning data for LLMs. Through this method, the authors created a dataset named BUTTONInstruct containing 8,000 high - quality data points and verified its effectiveness through extensive experiments.

Facilitating Multi-turn Function Calling for LLMs via Compositional Instruction Tuning

Multi-Task Instruction Tuning of LLaMa for Specific Scenarios: A Preliminary Study on Writing Assistance

Enhancing Function-Calling Capabilities in LLMs: Strategies for Prompt Formats, Data Integration, and Multilingual Translation

Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models

Benchmarking Complex Instruction-Following with Multiple Constraints Composition

CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models

Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models

AgentTuning: Enabling Generalized Agent Abilities for LLMs

Multi-IF: Benchmarking LLMs on Multi-Turn and Multilingual Instructions Following

Demystifying Instruction Mixing for Fine-tuning Large Language Models

MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback

Small LLMs Are Weak Tool Learners: A Multi-LLM Agent

Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization

LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning

Towards Robust Instruction Tuning on Multimodal Large Language Models

Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

Mixture-of-Instructions: Comprehensive Alignment of a Large Language Model through the Mixture of Diverse System Prompting Instructions

From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large Language Models

Balancing Accuracy and Efficiency in Multi-Turn Intent Classification for LLM-Powered Dialog Systems in Production