Iterative Experience Refinement of Software-Developing Agents

Chen Qian,Jiahao Li,Yufan Dang,Wei Liu,YiFei Wang,Zihao Xie,Weize Chen,Cheng Yang,Yingli Zhang,Zhiyuan Liu,Maosong Sun
2024-05-07
Abstract:Autonomous agents powered by large language models (LLMs) show significant potential for achieving high autonomy in various scenarios such as software development. Recent research has shown that LLM agents can leverage past experiences to reduce errors and enhance efficiency. However, the static experience paradigm, reliant on a fixed collection of past experiences acquired heuristically, lacks iterative refinement and thus hampers agents' adaptability. In this paper, we introduce the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution. We propose two fundamental patterns: the successive pattern, refining based on nearest experiences within a task batch, and the cumulative pattern, acquiring experiences across all previous task batches. Augmented with our heuristic experience elimination, the method prioritizes high-quality and frequently-used experiences, effectively managing the experience space and enhancing efficiency. Extensive experiments show that while the successive pattern may yield superior results, the cumulative pattern provides more stable performance. Moreover, experience elimination facilitates achieving better performance using just 11.54% of a high-quality subset.
Computation and Language,Artificial Intelligence,Multiagent Systems,Software Engineering
What problem does this paper attempt to address?
### Problems the paper attempts to solve The paper aims to solve the problem of how autonomous agents based on large - language models (LLMs) can effectively utilize and iteratively improve past experiences when performing tasks. Specifically, existing methods rely on a static experience paradigm, that is, collecting a fixed number of historical experiences at one time to guide future task execution. However, this static experience paradigm lacks an iterative optimization mechanism, resulting in insufficient adaptability of agents when dealing with complex tasks such as software development. To solve this problem, the paper introduces a new **Iterative Experience Refinement (IER) framework**, enabling agents to dynamically acquire, utilize, and eliminate experiences during task execution. The IER framework is implemented through two basic modes: 1. **Successive Pattern**: Optimize based on experiences in the most recent task batches. 2. **Cumulative Pattern**: Integrate experiences from all historical task batches. In addition, in order to prevent the disorderly expansion of the experience space, the paper also proposes a heuristic experience elimination mechanism, giving priority to retaining high - quality and frequently - used experiences, thereby improving the efficiency of experience management. ### Main contributions 1. **Propose the iterative experience optimization framework for the first time**: This framework enables agents to adaptively solve new tasks by dynamically acquiring, utilizing, and eliminating experiences. 2. **Propose an experience elimination mechanism**: Give priority to retaining high - quality and frequently - used experiences, reducing inefficiency problems caused by the expansion of the experience space. 3. **Experimental verification**: Through extensive experiments, it is proved that the successive pattern may perform better on some indicators, while the cumulative pattern provides more stable performance. At the same time, the experience elimination mechanism can achieve better performance by only retaining a 11.54% high - quality experience subset. ### Methodology - **Experience acquisition and utilization**: Through multi - round interactions of instructions and solutions, record and extract effective "shortcut" experiences. - **Experience propagation**: Through the successive pattern and the cumulative pattern, transfer experiences from one task batch to the next. - **Experience elimination**: Based on information density and usage frequency, eliminate low - quality experiences and retain high - quality experiences. ### Experimental evaluation - **Baseline methods**: Including GPTEngineer, MetaGPT, ChatDev, and ECL, etc. - **Dataset**: SRDD dataset, which contains 1,200 software requirement descriptions and is divided into 6 task batches. - **Evaluation indicators**: - Completeness - Executability - Consistency - Quality The experimental results show that the IER framework is significantly superior to other baseline methods on multiple indicators, and improves the quality and efficiency of software generation without significantly increasing the task execution time.