APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

Zuxin Liu,Thai Hoang,Jianguo Zhang,Ming Zhu,Tian Lan,Shirley Kokane,Juntao Tan,Weiran Yao,Zhiwei Liu,Yihao Feng,Rithesh Murthy,Liangwei Yang,Silvio Savarese,Juan Carlos Niebles,Huan Wang,Shelby Heinecke,Caiming Xiong

2024-06-27

Abstract:The advancement of function-calling agent models requires diverse, reliable, and high-quality datasets. This paper presents APIGen, an automated data generation pipeline designed to synthesize verifiable high-quality datasets for function-calling applications. We leverage APIGen and collect 3,673 executable APIs across 21 different categories to generate diverse function-calling datasets in a scalable and structured manner. Each data in our dataset is verified through three hierarchical stages: format checking, actual function executions, and semantic verification, ensuring its reliability and correctness. We demonstrate that models trained with our curated datasets, even with only 7B parameters, can achieve state-of-the-art performance on the Berkeley Function-Calling Benchmark, outperforming multiple GPT-4 models. Moreover, our 1B model achieves exceptional performance, surpassing GPT-3.5-Turbo and Claude-3 Haiku. We release a dataset containing 60,000 high-quality entries, aiming to advance the field of function-calling agent domains. The dataset is available on Huggingface: <a class="link-external link-https" href="https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k" rel="external noopener nofollow">this https URL</a> and the project homepage: <a class="link-external link-https" href="https://apigen-pipeline.github.io/" rel="external noopener nofollow">this https URL</a>

Computation and Language,Artificial Intelligence,Machine Learning,Software Engineering

What problem does this paper attempt to address?

The paper aims to address the challenges faced by large language models (LLMs) when performing function call tasks, particularly the issues related to the quality and diversity of the current training datasets. The paper proposes an automated pipeline named APIGen, designed to generate verifiable and diverse function call datasets. Through this approach, the researchers hope to enhance the performance of LLMs in real-world applications. Specifically, the main problems addressed by the paper include: 1. **Improving data quality**: Existing function call datasets often lack comprehensive validation, leading to potential inaccuracies or inefficiencies when models handle real-world application scenarios. 2. **Increasing data diversity**: To enable LLMs to better adapt to various APIs and application scenarios, it is necessary to create datasets that include a wide range of query types and APIs. 3. **Ensuring dataset scalability**: Designing a flexible and scalable data generation framework to easily integrate API data from different sources. To address the above issues, the paper contributes the following points: - **Proposing the APIGen framework**: This is an automated pipeline for generating high-quality, diverse function call datasets. It employs a multi-stage data validation process to ensure data accuracy and applicability. - **Developing and testing function call models**: Researchers used datasets generated by APIGen to train function call models of different scales and demonstrated their excellent performance on the Berkeley function call benchmark. - **Releasing a synthetic dataset**: The paper also publicly released a synthetic function call dataset containing 60,000 high-quality entries, including 3,673 APIs across 21 categories, aiming to promote further research and development in the field of function call agents. In summary, the goal of this paper is to improve the performance of LLMs in function call tasks by providing high-quality, diverse datasets, and to empirically demonstrate the effectiveness of the proposed solutions.

APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets

ToolACE: Winning the Points of LLM Function Calling

APIReal: an API Recognition and Linking Approach for Online Developer Forums

API Pack: A Massive Multi-Programming Language Dataset for API Call Generation

A Comprehensive Framework for Evaluating API-oriented Code Generation in Large Language Models

APIGen: Generative API Method Recommendation

ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents

AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction

AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

APITestGenie: Automated API Test Generation through Generative AI

ExploraCoder: Advancing code generation for multiple unseen APIs via planning and chained exploration

xLAM: A Family of Large Action Models to Empower AI Agent Systems

API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs

ToolCoder: Teach Code Generation Models to use API search tools

CAREER: Context-Aware API Recognition with Data Augmentation for API Knowledge Extraction

Code Generation for Collectible Card Games with Complex APIs

Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-Context Learning

Automatically Generating Task-Oriented API Learning Guide.

Beyond Text: Unveiling Multimodal Proficiency of Large Language Models with MultiAPI Benchmark

Generative API Usage Code Recommendation with Parameter Concretization

Compositional API Recommendation for Library-Oriented Code Generation