PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control

Ruijie Zheng,Ching-An Cheng,Hal Daumé III,Furong Huang,Andrey Kolobov
2024-06-06
Abstract:Temporal action abstractions, along with belief state representations, are a powerful knowledge sharing mechanism for sequential decision making. In this work, we propose a novel view that treats inducing temporal action abstractions as a sequence compression problem. To do so, we bring a subtle but critical component of LLM training pipelines -- input tokenization via byte pair encoding (BPE) -- to the seemingly distant task of learning skills of variable time span in continuous control domains. We introduce an approach called Primitive Sequence Encoding (PRISE) that combines continuous action quantization with BPE to learn powerful action abstractions. We empirically show that high-level skills discovered by PRISE from a multitask set of robotic manipulation demonstrations significantly boost the performance of both multitask imitation learning as well as few-shot imitation learning on unseen tasks. Our code is released at <a class="link-external link-https" href="https://github.com/FrankZheng2022/PRISE" rel="external noopener nofollow">this https URL</a>.
Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is in the field of continuous control (such as robotic manipulation), how to improve the performance of Behavior Cloning (BC) in downstream tasks by learning temporal action abstractions. Specifically, the paper proposes a new method named Primitive Sequence Encoding (PRISE). It views the learning of temporal action abstractions as a sequence compression problem and draws on discrete coding and sequence compression techniques in Natural Language Processing (NLP), especially Byte Pair Encoding (BPE), to learn efficient temporal action abstractions from multi - task offline demonstration datasets. ### Main Problems and Solutions 1. **High - Dimensional Observations and Complex Continuous Action Spaces** - In sequential decision - making problems, especially in robotic manipulation scenarios, high - dimensional observations (such as images) and complex continuous action spaces are often encountered. These problems make it difficult to directly learn effective policies. - **Solution**: By constructing abstractions - that is, compact belief states and action representations that can be generalized across different tasks, making learning in new scenarios more robust and data - efficient. 2. **Learning of Temporal Action Abstractions** - Learning temporal action abstractions (such as representations of multi - step primitive behaviors) has not fully benefited from successful methods in other fields, especially in continuous control. - **Solution**: The paper proposes to apply discrete coding and sequence compression techniques to the learning of temporal action abstractions. Specifically, by quantizing continuous actions into discrete codes and applying the BPE algorithm to identify variable - duration action primitives (skills) with the desired properties. 3. **Improving the Learning Efficiency of Downstream Tasks** - The temporal action abstractions learned from multi - task robotic manipulation demonstrations using PRISE significantly improve the performance of behavior cloning in downstream tasks. - **Solution**: By introducing the PRISE method, the continuous action space is quantized into discrete codes, and then the BPE algorithm is used to extract temporally extended action primitives from these discrete code sequences. These primitives show better performance in downstream tasks, especially in behavior cloning. ### Main Contributions of PRISE - **Innovatively Combining NLP Methods**: Applying discrete coding and sequence compression techniques in NLP (such as BPE) to the learning of temporal action abstractions in the field of continuous control. - **Improving Learning Efficiency**: Through the learned temporal action abstractions, PRISE significantly improves the performance of behavior cloning in downstream tasks, surpassing some existing strong baseline methods. - **Detailed Experimental Verification**: The effectiveness and superiority of PRISE are verified through experiments on multiple benchmark datasets. In summary, this paper aims to solve the problem of learning temporal action abstractions in the field of continuous control and effectively improves the performance of behavior cloning in downstream tasks by introducing the PRISE method.