Automatic Engineering of Long Prompts

Cho-Jui Hsieh,Si Si,Felix X. Yu,Inderjit S. Dhillon
2023-11-16
Abstract:Large language models (LLMs) have demonstrated remarkable capabilities in solving complex open-domain tasks, guided by comprehensive instructions and demonstrations provided in the form of prompts. However, these prompts can be lengthy, often comprising hundreds of lines and thousands of tokens, and their design often requires considerable human effort. Recent research has explored automatic prompt engineering for short prompts, typically consisting of one or a few sentences. However, the automatic design of long prompts remains a challenging problem due to its immense search space. In this paper, we investigate the performance of greedy algorithms and genetic algorithms for automatic long prompt engineering. We demonstrate that a simple greedy approach with beam search outperforms other methods in terms of search efficiency. Moreover, we introduce two novel techniques that utilize search history to enhance the effectiveness of LLM-based mutation in our search algorithm. Our results show that the proposed automatic long prompt engineering algorithm achieves an average of 9.2% accuracy gain on eight tasks in Big Bench Hard, highlighting the significance of automating prompt designs to fully harness the capabilities of LLMs.
Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the problem of automatic long prompt engineering. Specifically, it focuses on how to automatically find better long prompts (usually containing thousands of tokens) and how much performance improvement can be achieved by adjusting long prompts. Current research mainly focuses on the automatic optimization of short prompts, while the automatic design of long prompts remains challenging due to its vast search space. To tackle this issue, the authors propose a method that combines a greedy algorithm with beam search and introduces two novel techniques to leverage search history to enhance the effectiveness of mutation operations based on large language models (LLMs). The main contributions of the paper include: 1. Formally discussing the problem of automatic long prompt engineering for the first time and demonstrating significant performance improvements across multiple tasks. 2. Proposing a greedy algorithm with beam search that can quickly optimize prompts and introducing a new guided mutation method to improve convergence speed. 3. Conducting experiments on the Big Bench Hard benchmark, showing that the proposed automatic long prompt engineering method can significantly enhance performance, with an average accuracy improvement of 9.2% across eight selected tasks. By automating the optimization of long prompts, the researchers hope to further unleash the capabilities of large language models, making them more efficient and reliable in handling complex tasks.