Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners
Qihuang Zhong,Kang Wang,Ziyang Xu,Juhua Liu,Liang Ding,Bo Du,D. Tao,Ouyang Long,Jeffrey Wu,Xu Jiang,Diogo Almeida,Carroll L. Wainwright,Pamela Mishkin,Chong Zhang,Sandhini Agarwal,Katarina Slama,Alex Ray,John Schulman,Jacob Hilton,Fraser Kelton,Luke Miller,Maddie Simens,Amanda Askell,P. Welinder,P. Christiano,J. Leike,Ryan Lowe. 2022,Victor Sanh,Albert Webson,Colin Raffel,Stephen H. Bach,Lintang Sutawika,Zaid Alyafeai,Antoine Chaffin,Arnaud Stiegler,Arun Raja,Manan Dey,Saiful Bari,Canwen Xu,Urmish Thakker,Shanya Sharma,Eliza Szczechla,Taewoon Kim,Gunjan Chhablani,Nihal Nayak,Debajyoti Datta,Mike Jonathan Chang,Tianyuan Jiang,Han Wang,Matteo Manica,Sheng Shen,Zheng-Xin Yong,Harshit Pandey,Rachel Bawden,Thomas Wang,Trishala Neeraj,Jos Rozen,Abheesht Sharma,A. Santilli,Thibault Févry,Jason Alan Fries,Ryan Teehan,Teven Le Scao,Stella Biderman,Leo Gao,Thomas Wolf,Alexander M Rush. 2022,Multi-task,Alon Talmor,Jonathan Herzig,Nicholas Lourie,Jonathan Berant. 2019,Commonsenseqa,R. Thoppilan,Daniel De Freitas,Jamie Hall,Noam M. Shazeer,Apoorv Kulshreshtha,Heng-Tze Cheng,Alicia Jin,Taylor Bos,Leslie Baker,Yu Du,Hugo Touvron,Louis Martin,Kevin Stone,Peter Al-bert,Amjad Almahairi,Yasmine Babaei,Nikolay Bashlykov,Soumya Batra,Prajjwal Bhargava,Shruti,Lei Wang,Wanyu Xu,Yihuai Lan,Zhiqiang Hu,Yunshi Lan,Roy Ka-Wei,Xuezhi Wang,Jason Wei,Dale Schuurmans,V. Quoc,H. LeEd,Sharan Chi,Aakanksha Narang,Chowdhery Denny,Zhou,Maarten Bosma,E. Chi,Quoc V. Le,Denny Zhou. 2022,Xiaohan Xu,Chongyang Tao,Tao Shen,Shunyu Yao,Dian Yu,Jeffrey Zhao,Izhak Shafran,Thomas L. Griffiths,Yuan Cao,Karthik Narasimhan. 2023,Longhui Yu,Weisen Jiang,Han Shi,Jincheng Yu,Zhengying Liu,Yu Zhang,James T. Kwok,Zhenguo Li,Adrian Weller,Weiyang Liu,Meta-math,Zheng Yuan,Hongyi Yuan,Chengpeng Li,Guanting Dong,Ke Lu,Chuanqi Tan,Chang Zhou,Jingren Zhou. 2023,Nathanael Schärli,Le Hou
Abstract:Chain of Thought prompting strategy has enhanced the performance of Large Language Models (LLMs) across various NLP tasks. However, it still has shortcomings when dealing with complex reasoning tasks, following Wei et al. (2022), including understanding errors, calculation errors and process errors (e.g. missing-step and hallucinations). Subse-quently, Our in-depth analysis of various error types has found that deeply understanding the whole problem is critical in addressing complicated reasoning tasks. In this paper, we proposed a novel prompt strategy called Deeply Understanding the Problems (DUP) prompting, inspired by how humans solve complex reasoning problems, designed to enhance the comprehensive understanding of problems by LLMs. It consists of three stages: 1) extract the core question; 2) find out problem-solving information based on the core question; 3) generate and extract answers by LLMs. We evaluate the performance of DUP prompting on ten diverse reasoning datasets. Experimental results suggest that DUP prompting significantly outperforms Zero-Shot CoT (Kojima et al., 2022) across all datasets. Notably, DUP achieves state-of-the-art on SVAMP (90.4% to 94.2%) and GSM8K (94.6% to 97.1%).