From Language Models to Practical Self-Improving Computer Agents

Alex Sheng
2024-04-18
Abstract:We develop a simple and straightforward methodology to create AI computer agents that can carry out diverse computer tasks and self-improve by developing tools and augmentations to enable themselves to solve increasingly complex tasks. As large language models (LLMs) have been shown to benefit from non-parametric augmentations, a significant body of recent work has focused on developing software that augments LLMs with various capabilities. Rather than manually developing static software to augment LLMs through human engineering effort, we propose that an LLM agent can systematically generate software to augment itself. We show, through a few case studies, that a minimal querying loop with appropriate prompt engineering allows an LLM to generate and use various augmentations, freely extending its own capabilities to carry out real-world computer tasks. Starting with only terminal access, we prompt an LLM agent to augment itself with retrieval, internet search, web navigation, and text editor capabilities. The agent effectively uses these various tools to solve problems including automated software development and web-based tasks.
Artificial Intelligence
What problem does this paper attempt to address?
### The problems the paper attempts to solve This paper aims to solve the problem of how to create AI computer agents that can perform diverse computer tasks and self - improve. Specifically, the author proposes a method that enables large - language models (LLMs) to expand their capabilities through generating tools and enhancements, thereby solving increasingly complex tasks. The main contributions of the paper are as follows: 1. **Self - enhancing AI agents**: The author proposes a simple and straightforward method that enables LLMs to self - improve through generating tools and enhancements. These tools and enhancements can help agents solve more complex tasks. 2. **Automated tool generation**: Unlike manually developing static software to enhance LLMs, the author proposes an algorithmic loop that enables LLMs to automatically generate and use various enhancements through appropriate prompt engineering. 3. **Practical application cases**: The paper demonstrates the effectiveness of this method through several case studies. For example, starting with only terminal access rights, the author prompts the LLM agent to generate functions such as retrieval, Internet search, web page navigation, and text editing. 4. **Recursive self - improvement**: Through these automatically generated tools, agents can further expand their capabilities, solve more complex tasks, and even create more advanced tools to further enhance themselves. ### Abstract We have developed a simple and straightforward method to create AI computer agents that can perform diverse computer tasks and self - improve. Through several case studies, we show how the minimum query loop and appropriate prompt engineering enable LLMs to generate and use various enhancements, freely expanding their own capabilities to complete computer tasks in the real world. Starting with only terminal access rights, we prompt the LLM agent to generate functions such as retrieval, Internet search, web page navigation, and text editing, and effectively use these tools to solve problems including automated software development and web - based tasks. ### Introduction After pre - training on Internet - scale data, large - language models (LLMs) exhibit certain emergent reasoning abilities. This enables them to interact with external software tools, which interface with LLMs through input and output pipelines. This paper aims to better understand and systematize these model - enhanced software and proposes a method for automated development of model - enhanced software by creating agents that can autonomously generate tools and enhancements. These self - improving agents can flexibly solve diverse computer tasks, generate software to enhance themselves, and complete complex tasks that could not be initially solved. ### Method We introduce a method for constructing and operating general - purpose computer agents driven by LLMs. These agents can self - improve by generating tools. The system architecture includes an instruction - tuned LLM agent that can generate code and execute terminal commands in a computer environment. Through an algorithmic loop of continuous querying of the model, the agent receives tasks defined by human input and generates code and actions (in the form of terminal commands) to complete the tasks. Each time the model responds to a query, its output is parsed by the system, and the generated code blocks and terminal commands are stored and executed respectively. ### Experiments We conducted three case studies to demonstrate the flexibility of self - improving AI agents. These cases include: 1. Creating file viewing and editing tools for software development tasks. 2. Independently developing basic retrieval - enhancing tools. 3. Collaborating with human users to create Internet search and retrieval tools and enhance themselves to solve web - based tasks. Through these experiments, we show how agents start from basic tools, gradually generate more complex tools, and finally solve increasingly complex tasks.