Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

Atharva Gundawar,Mudit Verma,Lin Guan,Karthik Valmeekam,Siddhant Bhambri,Subbarao Kambhampati
2024-05-31
Abstract:As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies. Despite their perceived versatility, the research community is still unraveling effective strategies to harness these models in such complex domains. The recent discourse introduced by the paper on LLM Modulo marks a significant stride, proposing a conceptual framework that enhances the integration of LLMs into diverse planning and reasoning activities. This workshop paper delves into the practical application of this framework within the domain of travel planning, presenting a specific instance of its implementation. We are using the Travel Planning benchmark by the OSU NLP group, a benchmark for evaluating the performance of LLMs in producing valid itineraries based on user queries presented in natural language. While popular methods of enhancing the reasoning abilities of LLMs such as Chain of Thought, ReAct, and Reflexion achieve a meager 0%, 0.6%, and 0% with GPT3.5-Turbo respectively, our operationalization of the LLM-Modulo framework for TravelPlanning domain provides a remarkable improvement, enhancing baseline performances by 4.6x for GPT4-Turbo and even more for older models like GPT3.5-Turbo from 0% to 5%. Furthermore, we highlight the other useful roles of LLMs in the planning pipeline, as suggested in LLM-Modulo, which can be reliably operationalized such as extraction of useful critics and reformulator for critics.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to use large - language models (LLMs) for more effective planning and reasoning in the complex task of travel planning. Specifically, although existing research shows that LLMs perform excellently in handling traditional text tasks, they perform poorly in tasks requiring System 2 cognitive abilities such as planning and reasoning. Therefore, the paper proposes a new framework - the LLM - Modulo framework, aiming to enhance the performance of LLMs in planning tasks by combining external validators. ### Main Research Questions 1. **Improving the Performance of LLMs in Complex Planning Tasks**: The paper explores how to use the LLM - Modulo framework to enhance the ability of LLMs to generate reasonable and feasible travel plans in the specific domain of travel planning. 2. **Implementing the Specific Application of the LLM - Modulo Framework**: The paper details how to implement the LLM - Modulo framework in travel planning tasks, including how to design and use different Critics to evaluate and improve the plans generated by LLMs. 3. **Verifying the Effectiveness of the LLM - Modulo Framework**: By comparing with existing methods (such as Chain of Thought, ReAct, and Reflexion), the paper shows the significant advantages of the LLM - Modulo framework in travel planning tasks, especially its effect in improving the Final Pass Rate. ### Research Background - **Complexity of Travel Planning**: Travel planning involves decision - making in multiple aspects, such as destination selection, accommodation arrangement, transportation mode, and activity arrangement, which require managing long - term dependencies and logical reasoning. - **Limitations of Existing Methods**: Although some research indicates that LLMs perform poorly in some planning tasks, there is also a consensus that LLMs can assist in planning tasks through a more integrated architecture. - **LLM - Modulo Framework**: This framework evaluates the output of LLMs by introducing external validators and provides feedback when necessary, thereby improving the planning ability of LLMs. ### Experimental Results - **Performance of Baseline Models**: The travel plans directly generated by GPT - 3.5 - Turbo and GPT - 4 - Turbo have poor performance in the Final Pass Rate, which are 0% and 4.4% respectively. - **Effect of the LLM - Modulo Framework**: After using the LLM - Modulo framework, the Final Pass Rate of GPT - 3.5 - Turbo is increased from 0% to 5%, and that of GPT - 4 - Turbo is increased from 4.4% to 20.6%. - **Classification and Influence of Critics**: The paper classifies Critics into three categories: format checking, hard constraints, and common - sense constraints, and analyzes the influence of each type of Critic on the final performance. The results show that the comprehensive use of all types of Critics can significantly improve the quality of the plan. ### Conclusion By applying the LLM - Modulo framework in travel planning tasks, the paper shows the effectiveness and potential of this framework in improving the planning ability of LLMs. In particular, by introducing external validators and Critics, the LLM - Modulo framework can significantly increase the pass rate of the finally generated plan, surpassing existing methods. This research not only provides a new solution for travel planning tasks but also provides a reference for LLM applications in other complex planning tasks.