Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

Antoine Grosnit,Alexandre Maraval,James Doran,Giuseppe Paolo,Albert Thomas,Refinath Shahul Hameed Nabeezath Beevi,Jonas Gonzalez,Khyati Khandelwal,Ignacio Iacobacci,Abdelhakim Benechehab,Hamza Cherkaoui,Youssef Attia El-Hili,Kun Shao,Jianye Hao,Jun Yao,Balazs Kegl,Haitham Bou-Ammar,Jun Wang
2024-11-06
Abstract:We introduce Agent K v1.0, an end-to-end autonomous data science agent designed to automate, optimise, and generalise across diverse data science tasks. Fully automated, Agent K v1.0 manages the entire data science life cycle by learning from experience. It leverages a highly flexible structured reasoning framework to enable it to dynamically process memory in a nested structure, effectively learning from accumulated experience stored to handle complex reasoning tasks. It optimises long- and short-term memory by selectively storing and retrieving key information, guiding future decisions based on environmental rewards. This iterative approach allows it to refine decisions without fine-tuning or backpropagation, achieving continuous improvement through experiential learning. We evaluate our agent's apabilities using Kaggle competitions as a case study. Following a fully automated protocol, Agent K v1.0 systematically addresses complex and multimodal data science tasks, employing Bayesian optimisation for hyperparameter tuning and feature engineering. Our new evaluation framework rigorously assesses Agent K v1.0's end-to-end capabilities to generate and send submissions starting from a Kaggle competition URL. Results demonstrate that Agent K v1.0 achieves a 92.5\% success rate across tasks, spanning tabular, computer vision, NLP, and multimodal domains. When benchmarking against 5,856 human Kaggle competitors by calculating Elo-MMR scores for each, Agent K v1.0 ranks in the top 38\%, demonstrating an overall skill level comparable to Expert-level users. Notably, its Elo-MMR score falls between the first and third quartiles of scores achieved by human Grandmasters. Furthermore, our results indicate that Agent K v1.0 has reached a performance level equivalent to Kaggle Grandmaster, with a record of 6 gold, 3 silver, and 7 bronze medals, as defined by Kaggle's progression system.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to construct an end - to - end autonomous data science agent that can automate, optimize, and generalize the processing of various data science tasks. Specifically, the paper introduces a new model named Agent K v1.0, which aims to manage the entire data science life cycle by learning from experience, thereby overcoming the limitations of traditional methods in flexibility and feedback integration. ### Main Problems 1. **Automation and Optimization**: The dynamic nature of the data science workflow requires continuous monitoring and adaptation to real - time data changes, which makes automation difficult. For example, accurate data cleaning and feature engineering are essential before developing any machine - learning model, but the solutions must be customized for each case. 2. **Optimization Process**: The optimization process involves multiple steps, including feature selection, model training, hyper - parameter tuning, and evaluation, all of which operate in a large search space. Moreover, data scientists usually need to spend a great deal of time evaluating the impact of their choices because the pipeline must process a large amount of data and perform complex calculations to return performance metrics. ### Solutions To address the above challenges, the paper proposes a flexible learning - inference paradigm that eliminates the need for back - propagation and fine - tuning, enabling the model to learn and adapt from experience. Specifically: - **Structured Inference**: A memory module is introduced, which can dynamically utilize past successes and failures to achieve more adaptive learning. This memory module allows the agent to store past experiences and dynamically adjust strategies according to environmental feedback without retraining. - **Multimodal Capability**: Agent K v1.0 is able to handle multiple data modalities, including tabular data, computer vision, natural language processing, and multimodal tasks. - **Automated Task Setup**: The agent can start from a Kaggle competition URL, automatically set up data science tasks, generate complex code for data cleaning, feature engineering, model creation, and optimized training, and finally automatically generate a submission file and decide whether to submit it to Kaggle to obtain a score. ### Evaluation The paper evaluates the capabilities of Agent K v1.0 through Kaggle competitions. The results show that Agent K v1.0 achieves a 92.5% success rate in tasks in multiple fields, and its performance is equivalent to the Kaggle Grandmaster level, obtaining 6 gold medals, 3 silver medals, and 7 bronze medals. In addition, the Elo - MMR score of Agent K v1.0 ranks in the top 38% among 5,856 human competitors, demonstrating its excellent performance in data science tasks. ### Summary The main contribution of the paper is to propose a new learning - inference paradigm. Through the structured memory module, large - language models can learn and adapt from experience without relying on back - propagation and fine - tuning. This method significantly improves the automation and optimization levels of data science tasks and shows great potential in practical applications.