Abstract:We introduce LArge Model Based Data Agent (LAMBDA), a novel open-source, code-free multi-agent data analysis system that leverages the power of large models. LAMBDA is designed to address data analysis challenges in complex data-driven applications through innovatively designed data agents that operate iteratively and generatively using natural language. At the core of LAMBDA are two key agent roles: the programmer and the inspector, which are engineered to work together seamlessly. Specifically, the programmer generates code based on the user's instructions and domain-specific knowledge, enhanced by advanced models. Meanwhile, the inspector debugs the code when necessary. To ensure robustness and handle adverse scenarios, LAMBDA features a user interface that allows direct user intervention in the operational loop. Additionally, LAMBDA can flexibly integrate external models and algorithms through our proposed Knowledge Integration Mechanism, catering to the needs of customized data analysis. LAMBDA has demonstrated strong performance on various data analysis tasks. It has the potential to enhance data analysis paradigms by seamlessly integrating human and artificial intelligence, making it more accessible, effective, and efficient for users from diverse backgrounds. The strong performance of LAMBDA in solving data analysis problems is demonstrated using real-world data examples. Videos of several case studies are available at <a class="link-external link-https" href="https://xxxlambda.github.io/lambda_webpage" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### Problems the Paper Aims to Solve This paper introduces LAMBDA (Large Model-Based Data Agent System), which aims to address some key challenges in current data-driven applications, particularly in the field of complex data analysis tasks in statistics and data science. Specifically, LAMBDA mainly addresses the following issues: 1. **Overcoming Programming Barriers**: - Programming knowledge has always been a major obstacle for domain experts without a background in computer science or statistics when utilizing powerful AI tools for data analysis. LAMBDA interacts with users through natural language instructions, providing a no-code experience that significantly lowers the entry barrier, enabling professionals from various disciplines to perform data analysis and data mining more efficiently. 2. **Integrating Human Intelligence and Artificial Intelligence**: - The current data analysis paradigm lacks an effective intermediary to connect human intelligence and artificial intelligence. On one hand, AI models often lack the unlearned domain knowledge required for specific tasks; on the other hand, domain experts find it difficult to integrate their expertise into AI models to enhance performance. LAMBDA, through well-designed interfaces and key-value (KV) knowledge bases, allows the agent to access external resources such as algorithms or models, ensuring the effective integration of domain-specific knowledge and improving the accuracy and relevance of the agent in handling complex tasks. 3. **Reshaping Data Science Education**: - LAMBDA has the potential to become an interactive platform that transforms statistics and data science education. It provides educators with the ability to flexibly adjust teaching plans and seamlessly integrate the latest research findings, making it a valuable tool for delivering cutting-edge personalized learning experiences. 4. **Enhancing Reliability and Portability**: - LAMBDA emphasizes the reliability and portability of the system. Reliability refers to LAMBDA's ability to stably handle data analysis tasks and automatically resolve faults. Portability ensures that LAMBDA is compatible with various large language models (LLMs), allowing it to continuously leverage the latest advanced models for enhancement. Through these designs, LAMBDA aims to create a much-needed medium that facilitates seamless interaction between domain knowledge and AI capabilities, thereby achieving more efficient and effective data analysis in the field of statistics and data science.

LAMBDA: A Large Model Based Data Agent

Large Action Models: From Inception to Implementation

A Survey on Large Language Model-based Agents for Statistics and Data Science

xLAM: A Family of Large Action Models to Empower AI Agent Systems

MegaAgent: A Practical Framework for Autonomous Cooperation in Large-Scale LLM Agent Systems

DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models

Building Multi-Agent Copilot towards Autonomous Agricultural Data Management and Analysis

DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

Data Interpreter: An LLM Agent For Data Science

DataLab: A Unified Platform for LLM-Powered Business Intelligence

Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow

DART-LLM: Dependency-Aware Multi-Robot Task Decomposition and Execution using Large Language Models

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models

Large Language Model based Multi-Agents: A Survey of Progress and Challenges

LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments

AgentBench: Evaluating LLMs as Agents

Can Large Language Models Serve as Data Analysts? A Multi-Agent Assisted Approach for Qualitative Data Analysis

A Light Attention-Mixed-Base Deep Learning Architecture toward Process Multivariable Modeling and Knowledge Discovery

AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents

An AI Agent for Fully Automated Multi‐Omic Analyses

LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs