Abstract:Prompt-tuning has emerged as a promising method for adapting pre-trained models to downstream tasks or aligning with human preferences. Prompt learning is widely used in NLP but has limited applicability to RL due to the complex physical meaning and environment-specific information contained within RL prompts. These factors require supervised learning to imitate the demonstrations and may result in a loss of meaning after learning. Additionally, directly extending prompt-tuning approaches to RL is challenging because RL prompts guide agent behavior based on environmental modeling and analysis, rather than filling in missing information, making it unlikely that adjustments to the prompt format for downstream tasks, as in NLP, can yield significant improvements. In this work, we propose the Prompt-Tuning DT algorithm to address these challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information and optimizing prompts via black-box tuning to enhance their ability to contain more relevant information, thereby enabling agents to make better decisions. Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using preference ranking function to find the optimization direction, thereby providing more informative prompts and guiding the agent towards specific preferences in the target environment. Extensive experiments show that with only 0.03% of the parameters learned, Prompt-Tuning DT achieves comparable or even better performance than full-model fine-tuning in low-data scenarios. Our work contributes to the advancement of prompt-tuning approaches in RL, providing a promising direction for optimizing large RL agents for specific preference tasks.

Hierarchical Prompt Decision Transformer: Improving Few-Shot Policy Generalization with Global and Adaptive Guidance

Rethinking Decision Transformer via Hierarchical Reinforcement Learning

P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer

A Minimalist Prompt for Zero-Shot Policy Learning

Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer

Hierarchical Prompt Tuning for Few-Shot Multi-Task Learning

Prompt-Tuning Decision Transformer with Preference Ranking

Hierarchical Prompting Improves Visual Recognition On Accuracy, Data Efficiency and Explainability

ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning

Prompt Tuning with Diffusion for Few-Shot Pre-trained Policy Generalization

Evolving Parameterized Prompt Memory for Continual Learning

Decision Mamba: Reinforcement Learning via Hybrid Selective Sequence Modeling

Self-supervised Meta-Prompt Learning with Meta-Gradient Regularization for Few-shot Generalization

Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach

Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement

TransPrompt - Towards an Automatic Transferable Prompting Framework for Few-shot Text Classification.

Decision Transformer: Reinforcement Learning via Sequence Modeling

Predictive Coding for Decision Transformer

Instance-Aware Hierarchical Structured Policy for Prompt Learning in Vision-Language Models

UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers