Abstract:Multi-task multi-agent reinforcement learning aims to control multiple agents to perform well on multiple tasks. It encounters three core challenges: the varying number of agents and entities, the disparities in cooperative behaviors among different tasks, and the training imbalance caused by varying task difficulty levels. To address these issues, we propose a novel framework named Task-Entity Transformer Qmix (TETQmix), which employs pretrained language models for task encoding, utilizes proposed Task-Entity Transformer to handle observations across various tasks, and adjusts task learning weights to achieve balanced multi-task training. Task-Entity Transformer not only enables handling multi-task scenarios with varying numbers of agents and entities, but also leverages cross-attention modules to integrate observation and task embeddings, so that each agent can obtain individual values and decisions for multiple tasks. We then utilize a transformer-based mixer to monotonically combine the individual values, and train the whole network’s parameters using temporal-difference errors. To facilitate multi-task training, we define task regret as the difference between the current-stage return and the candidate best one, and adjust the learning weight of each task based on its task regret. Experiments are conducted on both simulated multi-particle environments and real-world multi-robot systems. Compared with existing baselines, our method not only is superior in multi-task learning efficiency, but also shows promising transfer ability on unseen tasks. Note to Practitioners —The flexibility of multi-agent systems makes them quite fit to multiple tasks. Compared to designing different decision models for different tasks, it is more convenient if one can use just one decision model to resolve multiple tasks. Besides, it can make the maximum utilization of trajectory data coming from similar tasks when the data are integrated for multi-task decision model training. Natural language provides a powerful tool to describe the task context and emphasize the similarities or differences among different tasks. Pretrained language models can encode the task context, based on which the decision model can adjust its output distribution for different tasks and even synthesize the decisions from existing and similar tasks to achieve promising zero-shot and few-shot transfer performance for unseen tasks. With our proposed TETQmix, practitioners are able to realize multi-task capability in multi-agent systems and increase the generalization in a variety of scenarios.

T3S: Improving Multi-Task Reinforcement Learning with Task-Specific Feature Selector and Scheduler.

Efficient Multi-Task Reinforcement Learning via Task-Specific Action Correction

A Brain-Inspired Incremental Multi-task Reinforcement Learning Approach

Leveraging the Efficiency of Multi-Task Robot Manipulation Via Task-Evoked Planner and Reinforcement Learning

A Deep Multi-Task Representation Learning Method for Time Series Classification and Retrieval.

A Decentralized Policy Gradient Approach to Multi-task Reinforcement Learning

Sample Efficient Myopic Exploration Through Multitask Reinforcement Learning with Diverse Tasks

QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing

Multi-Task Reinforcement Learning for Quadrotors

Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts

Shared-unique Features and Task-aware Prioritized Sampling on Multi-task Reinforcement Learning

Single-shot Feature Selection for Multi-task Recommendations

Multi-Task Reinforcement Learning with Soft Modularization.

Contrastive Modules with Temporal Attention for Multi-Task Reinforcement Learning

Multi-Task Recommendations with Reinforcement Learning

Efficient Multi-task Prompt Tuning for Recommendation

Multi-Task Multi-Agent Reinforcement Learning with Task-Entity Transformers and Value Decomposition Training

Theoretical Study of Conflict-Avoidant Multi-Objective Reinforcement Learning

Multi-task Deep Reinforcement Learning for Scalable Parallel Task Scheduling.

CFS-MTL: A Causal Feature Selection Mechanism for Multi-task Learning Via Pseudo-intervention

Robust Estimator Based Adaptive Multi-Task Learning