Abstract:Structured pruning is one of the most popular approaches to effectively compress the heavy deep neural networks (DNNs) into compact sub-networks while retaining performance. The existing methods suffer from multi-stage procedures along with significant engineering efforts and human expertise. The Only-Train-Once (OTO) series has been recently proposed to resolve the many pain points by streamlining the workflow by automatically conducting (i) search space generation, (ii) structured sparse optimization, and (iii) sub-network construction. However, the built-in sparse optimizers in the OTO series, i.e., the Half-Space Projected Gradient (HSPG) family, have limitations that require hyper-parameter tuning and the implicit controls of the sparsity exploration, consequently requires intervening by human expertise. To address such limitations, we propose a Hybrid Efficient Structured Sparse Optimizer (HESSO). HESSO could automatically and efficiently train a DNN to produce a high-performing subnetwork. Meanwhile, it is almost tuning-free and enjoys user-friendly integration for generic training applications. To address another common issue of irreversible performance collapse observed in pruning DNNs, we further propose a Corrective Redundant Identification Cycle (CRIC) for reliably identifying indispensable structures. We numerically demonstrate the efficacy of HESSO and its enhanced version HESSO-CRIC on a variety of applications ranging from computer vision to natural language processing, including large language model. The numerical results showcase that HESSO can achieve competitive even superior performance to varying state-of-the-arts and support most DNN architectures. Meanwhile, CRIC can effectively prevent the irreversible performance collapse and further enhance the performance of HESSO on certain applications. The code is available at <a class="link-external link-https" href="https://github.com/microsoft/only_train_once" rel="external noopener nofollow">this https URL</a>.

Optimizing DNNs with Partially Equivalent Transformations and Automated Corrections

EINNET: Optimizing Tensor Programs with Derivation-Based Transformations.

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

OLLIE: Derivation-based Tensor Program Optimizer

DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

MegTaiChi: Dynamic Tensor-based Memory Management Optimization for DNN Training

Optimizing DNN Computation with Relaxed Graph Substitutions

Optimizing Tensor Computation Graphs with Equality Saturation and Monte Carlo Tree Search

Uncovering Nested Data Parallelism and Data Reuse in DNN Computation with FractalTensor

Full Stack Optimization of Transformer Inference: a Survey

Tensorized NeuroEvolution of Augmenting Topologies for GPU Acceleration

Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training

TensorOpt: Exploring the Tradeoffs in Distributed DNN Training With Auto-Parallelism

DUET: Boosting Deep Neural Network Efficiency on Dual-Module Architecture

HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Efficient and Robust Parallel DNN Training through Model Parallelism on Multi-GPU Platform

STR: Hybrid Tensor Re-Generation to Break Memory Wall for DNN Training

pommDNN: Performance optimal GPU memory management for deep neural network training

High-Performance Tensor Learning Primitives Using GPU Tensor Cores

MAGIS: Memory Optimization Via Coordinated Graph Transformation and Scheduling for DNN