Mothman at SemEval-2024 Task 9: An Iterative System for Chain-of-Thought Prompt Optimization

Alvin Po-Chun Chen,Ray Groshan,Sean von Bayern
2024-05-04
Abstract:Extensive research exists on the performance of large language models on logic-based tasks, whereas relatively little has been done on their ability to generate creative solutions on lateral thinking tasks. The BrainTeaser shared task tests lateral thinking and uses adversarial datasets to prevent memorization, resulting in poor performance for out-of-the-box models. We propose a system for iterative, chain-of-thought prompt engineering which optimizes prompts using human evaluation. Using this shared task, we demonstrate our system's ability to significantly improve model performance by optimizing prompts and evaluate the input dataset.
Computation and Language
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem that large - language models (LLMs) perform poorly in tasks requiring lateral thinking. Specifically, the author focuses on how to improve the ability of LLMs in solving creative and non - traditional problems by optimizing Chain - of - Thought (CoT) prompts. #### Background and problem description 1. **Limitations of existing research**: - Current research mainly focuses on the performance of LLMs in logical reasoning tasks, while relatively little research has been done on lateral thinking tasks. - Lateral thinking tasks require the model to have the ability to solve problems creatively "out of the box", which is a challenge for existing LLMs. 2. **BRAIN TEASER shared task**: - The BRAIN TEASER shared task has designed two types of subtasks: sentence puzzles and word puzzles, which are used to test and evaluate the lateral thinking ability of the model. - This task uses adversarial datasets to prevent the model from relying on memory rather than reasoning, resulting in poor performance of the model on these tasks. 3. **Core of the problem**: - How to optimize CoT prompts so that LLMs can better understand and solve problems requiring creative thinking. - Through iterative optimization of prompts, combined with human evaluation, identify the specific challenges encountered by the model in the reasoning process and improve the prompts in a targeted manner. #### Proposed solutions The author proposes a method for iteratively optimizing CoT prompts, which mainly includes the following steps: 1. **Randomly sample training data and generate initial CoT prompts**: - Randomly select samples from the training set to generate initial CoT prompts. 2. **Identify different categories in output reasoning and divide training data**: - Divide the training data into different categories according to the type of reasoning output by the model. 3. **Independent human evaluation**: - Conduct independent human evaluation on the data of each category to identify specific reasoning challenges. 4. **Develop new CoT prompts based on evaluation results**: - Use the results of human evaluation to develop new CoT prompts, especially for those options that are likely to mislead the model. 5. **Optional data collection / synthesis improvement**: - Identify deficiencies in the data and provide guidance for future data collection and synthesis. Through this method, the author not only improves the performance of the model on adversarial datasets, but also provides valuable insights for the creation of future datasets.