Abstract:We introduce Agent K v1.0, an end-to-end autonomous data science agent designed to automate, optimise, and generalise across diverse data science tasks. Fully automated, Agent K v1.0 manages the entire data science life cycle by learning from experience. It leverages a highly flexible structured reasoning framework to enable it to dynamically process memory in a nested structure, effectively learning from accumulated experience stored to handle complex reasoning tasks. It optimises long- and short-term memory by selectively storing and retrieving key information, guiding future decisions based on environmental rewards. This iterative approach allows it to refine decisions without fine-tuning or backpropagation, achieving continuous improvement through experiential learning. We evaluate our agent's apabilities using Kaggle competitions as a case study. Following a fully automated protocol, Agent K v1.0 systematically addresses complex and multimodal data science tasks, employing Bayesian optimisation for hyperparameter tuning and feature engineering. Our new evaluation framework rigorously assesses Agent K v1.0's end-to-end capabilities to generate and send submissions starting from a Kaggle competition URL. Results demonstrate that Agent K v1.0 achieves a 92.5\% success rate across tasks, spanning tabular, computer vision, NLP, and multimodal domains. When benchmarking against 5,856 human Kaggle competitors by calculating Elo-MMR scores for each, Agent K v1.0 ranks in the top 38\%, demonstrating an overall skill level comparable to Expert-level users. Notably, its Elo-MMR score falls between the first and third quartiles of scores achieved by human Grandmasters. Furthermore, our results indicate that Agent K v1.0 has reached a performance level equivalent to Kaggle Grandmaster, with a record of 6 gold, 3 silver, and 7 bronze medals, as defined by Kaggle's progression system.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to construct an end - to - end autonomous data science agent that can automate, optimize, and generalize the processing of various data science tasks. Specifically, the paper introduces a new model named Agent K v1.0, which aims to manage the entire data science life cycle by learning from experience, thereby overcoming the limitations of traditional methods in flexibility and feedback integration. ### Main Problems 1. **Automation and Optimization**: The dynamic nature of the data science workflow requires continuous monitoring and adaptation to real - time data changes, which makes automation difficult. For example, accurate data cleaning and feature engineering are essential before developing any machine - learning model, but the solutions must be customized for each case. 2. **Optimization Process**: The optimization process involves multiple steps, including feature selection, model training, hyper - parameter tuning, and evaluation, all of which operate in a large search space. Moreover, data scientists usually need to spend a great deal of time evaluating the impact of their choices because the pipeline must process a large amount of data and perform complex calculations to return performance metrics. ### Solutions To address the above challenges, the paper proposes a flexible learning - inference paradigm that eliminates the need for back - propagation and fine - tuning, enabling the model to learn and adapt from experience. Specifically: - **Structured Inference**: A memory module is introduced, which can dynamically utilize past successes and failures to achieve more adaptive learning. This memory module allows the agent to store past experiences and dynamically adjust strategies according to environmental feedback without retraining. - **Multimodal Capability**: Agent K v1.0 is able to handle multiple data modalities, including tabular data, computer vision, natural language processing, and multimodal tasks. - **Automated Task Setup**: The agent can start from a Kaggle competition URL, automatically set up data science tasks, generate complex code for data cleaning, feature engineering, model creation, and optimized training, and finally automatically generate a submission file and decide whether to submit it to Kaggle to obtain a score. ### Evaluation The paper evaluates the capabilities of Agent K v1.0 through Kaggle competitions. The results show that Agent K v1.0 achieves a 92.5% success rate in tasks in multiple fields, and its performance is equivalent to the Kaggle Grandmaster level, obtaining 6 gold medals, 3 silver medals, and 7 bronze medals. In addition, the Elo - MMR score of Agent K v1.0 ranks in the top 38% among 5,856 human competitors, demonstrating its excellent performance in data science tasks. ### Summary The main contribution of the paper is to propose a new learning - inference paradigm. Through the structured memory module, large - language models can learn and adapt from experience without relying on back - propagation and fine - tuning. This method significantly improves the automation and optimization levels of data science tasks and shows great potential in practical applications.

Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents

KG-Agent: An Efficient Autonomous Agent Framework for Complex Reasoning over Knowledge Graph

AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions

MAgIC: Investigation of Large Language Model Powered Multi-Agent in Cognition, Adaptability, Rationality and Collaboration

Agents meet OKR: An Object and Key Results Driven Agent System with Hierarchical Self-Collaboration and Self-Evaluation

Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning

K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning

Data Interpreter: An LLM Agent For Data Science

GameBench: Evaluating Strategic Reasoning Abilities of LLM Agents

Reasoning, Memorization, and Fine-Tuning Language Models for Non-Cooperative Games

SciAgent: Tool-augmented Language Models for Scientific Reasoning

Towards Reasoning in Large Language Models via Multi-Agent Peer Review Collaboration

Large Language Model As Autonomous Decision Maker

ProAgent: Building Proactive Cooperative Agents with Large Language Models

MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation

Agent Instructs Large Language Models to be General Zero-Shot Reasoners

CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay

Grandmaster level in StarCraft II using multi-agent reinforcement learning