Optimizing Large Language Models for Dynamic Constraints through Human-in-the-Loop Discriminators

Timothy Wei,Annabelle Miin,Anastasia Miin
2024-10-24
Abstract:Large Language Models (LLMs) have recently demonstrated impressive capabilities across various real-world applications. However, due to the current text-in-text-out paradigm, it remains challenging for LLMs to handle dynamic and complex application constraints, let alone devise general solutions that meet predefined system goals. Current common practices like model finetuning and reflection-based reasoning often address these issues case-by-case, limiting their generalizability. To address this issue, we propose a flexible framework that enables LLMs to interact with system interfaces, summarize constraint concepts, and continually optimize performance metrics by collaborating with human experts. As a case in point, we initialized a travel planner agent by establishing constraints from evaluation interfaces. Then, we employed both LLM-based and human discriminators to identify critical cases and continuously improve agent performance until the desired outcomes were achieved. After just one iteration, our framework achieved a $7.78\%$ pass rate with the human discriminator (a $40.2\%$ improvement over baseline) and a $6.11\%$ pass rate with the LLM-based discriminator. Given the adaptability of our proposal, we believe this framework can be applied to a wide range of constraint-based applications and lay a solid foundation for model finetuning with performance-sensitive data samples.
Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: the challenges faced by large - language models (LLMs) when dealing with dynamic and complex application constraints. Specifically, the current text - input - text - output pattern makes it difficult for LLMs to handle these constraints, and the existing model - fine - tuning and reflection - based reasoning methods can usually only solve problems on a case - by - case basis, limiting their generality. The paper proposes a flexible framework that enables LLMs to interact with system interfaces, summarize constraint concepts, and continuously optimize performance metrics through cooperation with human experts. This framework aims to improve the performance of LLMs in dealing with dynamic constraints, especially when pre - defined system goals need to be met. ### Main contributions of the paper: 1. **Concept - to - Optimization (CTOp) method**: Introduces a new learning process, which is achieved through multi - agent collaboration. That is, the planning agent establishes constraint concepts and identifies key data cases through cooperation with human and LLM discriminant agents, thereby continuously optimizing its learning process. 2. **Evaluation on the travel - planning dataset**: Evaluates the effectiveness of this method on the travel - planning dataset and shows preliminary results. 3. **Performance analysis**: Conducts a strict performance analysis of the proposed planning agent and discriminant agent, identifies the performance gap between human and LLM discriminant agents, and lays the foundation for future research. ### Methods of the paper: 1. **Construction of constraint concepts**: - **Explicit constraints**: Rules or conditions obtained directly from knowledge bases, system application interface (API) specifications, or other descriptive resources. - **Implicit constraints**: Constraints inferred from the input and output of the system through interaction with the system API. The paper considers both black - box and white - box systems. 2. **Optimization driven by discriminant feedback**: - Introduces a feedback loop that contains two discriminant agents: a human discriminant agent and an LLM discriminant agent. These two discriminant agents review the data cases processed by the initial agent and identify key cases that can improve planning performance. - The discriminant agents rank the data cases according to the difficulty of evaluation, enabling the planning agent to focus on these key cases to generate higher - quality plans. ### Experimental setup: 1. **Stage 1: Develop the initial planning agent**: - Use GPT - 4 Turbo as the base travel - planning agent model and generate the initial prompt through automatic summarization of Python code. - Utilize abstract syntax tree (AST) analysis to provide input and output pairs of methods to further improve the summarization results. 2. **Stage 2: Improve the agent through the refined prompts provided by the discriminant agents**: - Improve the prompts generated in stage 1 by selecting key plans and adding them to the prompts. - Evaluate the performance of the human discriminant agent and the LLM discriminant agent, using 10 randomly selected plans as the discriminant dataset. ### Performance evaluation: - **Initial GPT - 4o automatically generated prompts**: Performed poorly, below the GPT - 4 Turbo baseline. - **After adding the LLM discriminant agent**: The final pass rate increased from 5.55% to 6.11%. - **After adding the human discriminant agent**: The pass rate increased significantly from 5.55% to 7.78%. - **Manually modified prompts**: Performed far better than the GPT - 4 Turbo baseline, with the final pass rate being 2.3 times that of the baseline (12.78% vs. 5.55%). ### Key findings: - Manually refining the initial prompt significantly improves the performance of the agent. - Both human experts and LLM discriminant agents perform well in identifying useful plans for prompt refinement. - Reconstructing the reference information from JSON format to CSV format significantly improves performance. ### Limitations: - Limited data: The travel information, accommodation, restaurants, and attraction options in the reference dataset are limited and may contain unsafe or incorrect information. - Automation dependence: The framework mainly depends on automatically generating prompts and evaluating results, but the current performance of the LLM discriminant agent is limited and requires more human intervention. In conclusion, the paper proposes a new framework that optimizes the performance of LLMs in dealing with dynamic constraints by combining human experts and LLM discriminant agents, shows initial success in travel - planning tasks, and provides directions for future research.