Abstract:This paper focuses on the problem of learning compressed state representations for multi-agent tasks. Under the assumption of rich observation, we pinpoint that the state representations should be compressed both spatially and temporally to enable efficient prioritization of task-relevant features, while existing works typically fail. To overcome this limitation, we propose a novel method named Spatio-Temporal stAte compRession (STAR) that explicitly defines both spatial and temporal compression operations on the learned state representations to encode per-agent task-relevant features. Specifically, we first formalize this problem by introducing Task Informed Partially Observable Stochastic Game (TI-POSG). Then, we identify the spatial representation compression in it as encoding the latent states from the joint observations of all agents, and achieve this by learning representations that approximate the latent states based on the information theoretical principle. After that, we further extract the task-relevant features of each agent from these representations by aligning them based on their reward similarities, which is regarded as the temporal representation compression. Structurally, we implement these two compression by learning a set of agent-specific decoding functions and incorporate them into a critic shared by agents for scalable learning. We evaluate our method by developing decentralized policies on 12 maps of the StarCraft Multi-Agent Challenge benchmark, and the superior performance demonstrates its effectiveness.

Streaming Sequence Transduction through Dynamic Compression

TRACE: Real-time Compression of Streaming Trajectories in Road Networks

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Shifted Chunk Encoder for Transformer Based Streaming End-to-End ASR

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Compute Cost Amortized Transformer for Streaming ASR

Streaming Audio Transformers for Online Audio Tagging

Streaming Punctuation for Long-form Dictation with Transformers

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations

STAR: Spatio-Temporal State Compression for Multi-Agent Tasks with Rich Observations

Implementing and Optimizing the Scaled Dot-Product Attention on Streaming Dataflow

Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition

Accurate and Fast Compressed Video Captioning

One in A Hundred: Selecting the Best Predicted Sequence from Numerous Candidates for Speech Recognition

Real-time Online Video Detection with Temporal Smoothing Transformers

Exploring RWKV for Memory Efficient and Low Latency Streaming ASR

Transtreaming: Adaptive Delay-aware Transformer for Real-time Streaming Perception

Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

A Joint Online Transcoding and Delivery Approach for Dynamic Adaptive Streaming

Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR