The Benefits of a Concise Chain of Thought on Problem-Solving in Large Language Models

Matthew Renze,Erhan Guven
2024-10-20
Abstract:In this paper, we introduce Concise Chain-of-Thought (CCoT) prompting. We compared standard CoT and CCoT prompts to see how conciseness impacts response length and correct-answer accuracy. We evaluated this using GPT-3.5 and GPT-4 with a multiple-choice question-and-answer (MCQA) benchmark. CCoT reduced average response length by 48.70% for both GPT-3.5 and GPT-4 while having a negligible impact on problem-solving performance. However, on math problems, GPT-3.5 with CCoT incurs a performance penalty of 27.69%. Overall, CCoT leads to an average per-token cost reduction of 22.67%. All code, data, and supplemental materials are available on GitHub at <a class="link-external link-https" href="https://github.com/matthewrenze/jhu-concise-cot" rel="external noopener nofollow">this https URL</a>
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to explore the impact of Concise Chain - of - Thought (CCoT) prompts on the problem - solving ability and response length of large language models (LLMs). Specifically, the researchers compared the performance of standard Chain - of - Thought (CoT) prompts and CCoT prompts in multiple - choice question answering (MCQA) tasks, and used two models, GPT - 3.5 and GPT - 4, for the experiments. They hope to determine through this research: 1. **Reduction in response length**: Whether CCoT can significantly reduce the response length of LLMs, thereby reducing the cost of use. 2. **Impact on problem - solving performance**: Whether CCoT will have a negative impact on the problem - solving performance of LLMs, especially on the performance in mathematical problems. 3. **Economy and efficiency**: By reducing the response length, can CCoT bring cost savings and improve energy utilization efficiency and response speed? The research results show that CCoT can indeed significantly reduce the response length while maintaining problem - solving performance comparable to that of standard CoT in most cases. However, when using CCoT, GPT - 3.5 showed a performance decline in mathematical problems. These findings have important practical significance for AI engineers to optimize the application of LLM, and also provide theoretical inspiration for studying how LLMs perform step - by - step reasoning.