Abstract:Model compression methods are being developed to bridge the gap between the massive scale of neural networks and the limited hardware resources on edge devices. Since most real-world applications deployed on resource-limited hardware platforms typically have multiple hardware constraints simultaneously, most existing model compression approaches that only consider optimizing one single hardware objective are ineffective. In this article, we propose an automated pruning method called multi-constrained model compression (MCMC) that allows for the optimization of multiple hardware targets, such as latency, floating point operations (FLOPs), and memory usage, while minimizing the impact on accuracy. Specifically, we propose an improved multi-objective reinforcement learning (MORL) algorithm, the one-stage envelope deep deterministic policy gradient (DDPG) algorithm, to determine the pruning strategy for neural networks. Our improved one-stage envelope DDPG algorithm reduces exploration time and offers greater flexibility in adjusting target priorities, enhancing its suitability for pruning tasks. For instance, on the visual geometry group (VGG)-16 network, our method achieved an 80% reduction in FLOPs, a 2.31× reduction in memory usage, and a 1.92× acceleration, with an accuracy improvement of 0.09% compared with the baseline. For larger datasets, such as ImageNet, we reduced FLOPs by 50% for MobileNet-V1, resulting in a 4.7× faster speed and 1.48× memory compression, while maintaining the same accuracy. When applied to edge devices, such as JETSON XAVIER NX, our method resulted in a 71% reduction in FLOPs for MobileNet-V1, leading to a 1.63× faster speed, 1.64× memory compression, and an accuracy improvement.

GTCRN: A Speech Enhancement Model Requiring Ultralow Computational Resources

MCMC: Multi-Constrained Model Compression Via One-Stage Envelope Reinforcement Learning.

Dynamic Gated Recurrent Neural Network for Compute-efficient Speech Enhancement

TFCN: Temporal-Frequential Convolutional Network for Single-Channel Speech Enhancement

Dense-TSNet: Dense Connected Two-Stage Structure for Ultra-Lightweight Speech Enhancement

Lite-RTSE: Exploring a Cost-Effective Lite DNN Model for Real-Time Speech Enhancement in RTC Scenarios

EffCRN: An Efficient Convolutional Recurrent Network for High-Performance Speech Enhancement

TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models

DPCRN: Dual-Path Convolution Recurrent Network for Single Channel Speech Enhancement

Single-Channel Speech Enhancement Algorithm Based on ME-MGCRN in Low Signal-to-Noise Scenario

Convolutional gated recurrent unit networks based real-time monaural speech enhancement

EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech

Residual Convolutional CTC Networks for Automatic Speech Recognition.

Speech enhancement using progressive learning-based convolutional recurrent neural network

S-DCCRN: Super Wide Band DCCRN with Learnable Complex Feature for Speech Enhancement

Advanced Recurrent Network-Based Hybrid Acoustic Models for Low Resource Speech Recognition

Towards efficient models for real-time deep noise suppression

SASEGAN-TCN: Speech enhancement algorithm based on self-attention generative adversarial network and temporal convolutional network

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

Efficiently Trained Low-Resource Mongolian Text-to-Speech System Based On FullConv-TTS