Abstract:Deep neural network (DNN)-based policy models like vision-language-action (VLA) models are transformative in automating complex decision-making across applications by interpreting multi-modal data. However, scaling these models greatly increases computational costs, which presents challenges in fields like robot manipulation and autonomous driving that require quick, accurate responses. To address the need for deployment on resource-limited hardware, we propose a new quantization framework for IL-based policy models that fine-tunes parameters to enhance robustness against low-bit precision errors during training, thereby maintaining efficiency and reliability under constrained conditions. Our evaluations with representative robot manipulation for 4-bit weight-quantization on a real edge GPU demonstrate that our framework achieves up to 2.5x speedup and 2.5x energy savings while preserving accuracy. For 4-bit weight and activation quantized self-driving models, the framework achieves up to 3.7x speedup and 3.1x energy saving on a low-end GPU. These results highlight the practical potential of deploying IL-based policy models on resource-constrained devices.

What problem does this paper attempt to address?

### Problems the paper attempts to solve The paper aims to solve the problem of excessive computational and memory costs when deep neural networks (DNNs) are deployed on resource - constrained hardware in fields such as robot control and autonomous driving. Specifically, policy models based on imitation learning (IL), such as vision - language - action models, although perform well in automating complex decision - making, their large - scale expansion has led to a significant increase in computational costs, especially in tasks that require rapid and accurate responses (such as robot manipulation and autonomous driving). These problems make it difficult to effectively deploy these models on hardware with limited resources. To meet this challenge, the paper proposes a new quantization framework - **Quantization - Aware Imitation Learning (QAIL)**, which enhances robustness under low - bit - precision errors by fine - tuning parameters, thereby maintaining efficiency and reliability under resource - constrained conditions. In addition, the paper also introduces **Quantization - Robust Behavior Cloning (QBC)** to further improve the performance of the quantized model and ensure its accuracy in long - sequence tasks. ### Main contributions 1. **QAIL framework**: Integrates quantization into the process of imitation learning, optimizes parameters to adapt to the low - precision environment, and reduces the impact of quantization errors on model performance. 2. **QBC mechanism**: Improves the performance of the quantized model in complex tasks by minimizing the behavioral differences between the quantized policy and the full - precision policy. 3. **Experimental verification**: Conducted extensive experiments in robot manipulation and autonomous driving tasks, demonstrating that this method can achieve significant speed improvements and energy consumption savings under 4 - bit weight quantization and activation quantization while maintaining a high success rate. ### Key formulas - **IL loss function**: \[ L_{\text{IL}}(\theta)=-\frac{1}{|D_E|}\sum_{(s, a)\in D_E}\log\pi_\theta(a|s) \] - **QBC loss function**: \[ L_{\text{QBC}}(\theta)=E_{s_t\sim\pi^q_\theta}[D(\pi^q_\theta(a_t|s_t),\pi^{FP}(a_t|s_t))] \] - **Total loss function**: \[ L_{\text{total}}(\theta)=L_{\text{QAIL}}(\theta)+\lambda L_{\text{QBC}}(\theta) \] Through these methods, the paper shows how to efficiently deploy policy models based on imitation learning on resource - constrained devices while maintaining accuracy and stability.

Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control

HotaQ: Hardware Oriented Token Adaptive Quantization for Large Language Models

PackQViT: Faster Sub-8-bit Vision Transformers Via Full and Packed Quantization on the Mobile.

Quantized deep learning models on low-power edge devices for robotic systems

Leveraging Automated Mixed-Low-Precision Quantization for tiny edge microcontrollers

Just Round: Quantized Observation Spaces Enable Memory Efficient Learning of Dynamic Locomotion

Policy Compression for Intelligent Continuous Control on Low-Power Edge Devices

Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning

Prompt, Plan, Perform: LLM-based Humanoid Control via Quantized Imitation Learning

Automatic low-bit hybrid quantization of neural networks through meta learning

Hardware-Centric AutoML for Mixed-Precision Quantization

Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge

TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation

VQ-ACE: Efficient Policy Search for Dexterous Robotic Manipulation via Action Chunking Embedding

Joint Accuracy and Latency Optimization for Quantized Federated Learning in Vehicular Networks

Learning Accurate Low-bit Quantization towards Efficient Computational Imaging

FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs

Dataflow-based Joint Quantization of Weights and Activations for Deep Neural Networks

Neuromorphic quadratic programming for efficient and scalable model predictive control

Learning to Quantize Deep Networks by Optimizing Quantization Intervals with Task Loss

An Efficient Model-Based Approach on Learning Agile Motor Skills without Reinforcement