Abstract:As the applicability of Large Language Models (LLMs) extends beyond traditional text processing tasks, there is a burgeoning interest in their potential to excel in planning and reasoning assignments, realms traditionally reserved for System 2 cognitive competencies. Despite their perceived versatility, the research community is still unraveling effective strategies to harness these models in such complex domains. The recent discourse introduced by the paper on LLM Modulo marks a significant stride, proposing a conceptual framework that enhances the integration of LLMs into diverse planning and reasoning activities. This workshop paper delves into the practical application of this framework within the domain of travel planning, presenting a specific instance of its implementation. We are using the Travel Planning benchmark by the OSU NLP group, a benchmark for evaluating the performance of LLMs in producing valid itineraries based on user queries presented in natural language. While popular methods of enhancing the reasoning abilities of LLMs such as Chain of Thought, ReAct, and Reflexion achieve a meager 0%, 0.6%, and 0% with GPT3.5-Turbo respectively, our operationalization of the LLM-Modulo framework for TravelPlanning domain provides a remarkable improvement, enhancing baseline performances by 4.6x for GPT4-Turbo and even more for older models like GPT3.5-Turbo from 0% to 5%. Furthermore, we highlight the other useful roles of LLMs in the planning pipeline, as suggested in LLM-Modulo, which can be reliably operationalized such as extraction of useful critics and reformulator for critics.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to use large - language models (LLMs) for more effective planning and reasoning in the complex task of travel planning. Specifically, although existing research shows that LLMs perform excellently in handling traditional text tasks, they perform poorly in tasks requiring System 2 cognitive abilities such as planning and reasoning. Therefore, the paper proposes a new framework - the LLM - Modulo framework, aiming to enhance the performance of LLMs in planning tasks by combining external validators. ### Main Research Questions 1. **Improving the Performance of LLMs in Complex Planning Tasks**: The paper explores how to use the LLM - Modulo framework to enhance the ability of LLMs to generate reasonable and feasible travel plans in the specific domain of travel planning. 2. **Implementing the Specific Application of the LLM - Modulo Framework**: The paper details how to implement the LLM - Modulo framework in travel planning tasks, including how to design and use different Critics to evaluate and improve the plans generated by LLMs. 3. **Verifying the Effectiveness of the LLM - Modulo Framework**: By comparing with existing methods (such as Chain of Thought, ReAct, and Reflexion), the paper shows the significant advantages of the LLM - Modulo framework in travel planning tasks, especially its effect in improving the Final Pass Rate. ### Research Background - **Complexity of Travel Planning**: Travel planning involves decision - making in multiple aspects, such as destination selection, accommodation arrangement, transportation mode, and activity arrangement, which require managing long - term dependencies and logical reasoning. - **Limitations of Existing Methods**: Although some research indicates that LLMs perform poorly in some planning tasks, there is also a consensus that LLMs can assist in planning tasks through a more integrated architecture. - **LLM - Modulo Framework**: This framework evaluates the output of LLMs by introducing external validators and provides feedback when necessary, thereby improving the planning ability of LLMs. ### Experimental Results - **Performance of Baseline Models**: The travel plans directly generated by GPT - 3.5 - Turbo and GPT - 4 - Turbo have poor performance in the Final Pass Rate, which are 0% and 4.4% respectively. - **Effect of the LLM - Modulo Framework**: After using the LLM - Modulo framework, the Final Pass Rate of GPT - 3.5 - Turbo is increased from 0% to 5%, and that of GPT - 4 - Turbo is increased from 4.4% to 20.6%. - **Classification and Influence of Critics**: The paper classifies Critics into three categories: format checking, hard constraints, and common - sense constraints, and analyzes the influence of each type of Critic on the final performance. The results show that the comprehensive use of all types of Critics can significantly improve the quality of the plan. ### Conclusion By applying the LLM - Modulo framework in travel planning tasks, the paper shows the effectiveness and potential of this framework in improving the planning ability of LLMs. In particular, by introducing external validators and Critics, the LLM - Modulo framework can significantly increase the pass rate of the finally generated plan, surpassing existing methods. This research not only provides a new solution for travel planning tasks but also provides a reference for LLM applications in other complex planning tasks.

Robust Planning with LLM-Modulo Framework: Case Study in Travel Planning

LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks

A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models

Robust Planning with Compound LLM Architectures: An LLM-Modulo Approach

On the Planning Abilities of Large Language Models : A Critical Investigation

Improving Planning with Large Language Models: A Modular Agentic Architecture

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

Exploring and Benchmarking the Planning Capabilities of Large Language Models

On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark)

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning

NATURAL PLAN: Benchmarking LLMs on Natural Language Planning

Large Language Models Can Solve Real-World Planning Rigorously with Formal Verification Tools

NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions

Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1

Can We Rely on LLM Agents to Draft Long-Horizon Plans? Let's Take TravelPlanner as an Example

TRIP-PAL: Travel Planning with Guarantees by Combining Large Language Models and Automated Planners

Smart Language Agents in Real-World Planning

Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models

Query-Efficient Planning with Language Models