Abstract:The record-breaking performance of deep neural networks (DNNs) comes with heavy parameter budgets, which leads to external dynamic random access memory (DRAM) for storage. The prohibitive energy of DRAM accesses makes it nontrivial for DNN deployment on resource-constrained devices, calling for minimizing the movements of weights and data in order to improve the energy efficiency. Driven by this critical bottleneck, we present SmartDeal, a hardware-friendly algorithm framework to trade higher-cost memory storage/access for lower-cost computation, in order to aggressively boost the storage and energy efficiency, for both DNN inference and training. The core technique of SmartDeal is a novel DNN weight matrix decomposition framework with respective structural constraints on each matrix factor, carefully crafted to unleash the hardware-aware efficiency potential. Specifically, we decompose each weight tensor as the product of a small basis matrix and a large structurally sparse coefficient matrix whose nonzero elements are readily quantized to the power-of-2. The resulting sparse and readily quantized DNNs enjoy greatly reduced energy consumption in data movement as well as weight storage, while incurring minimal overhead to recover the original weights thanks to the required sparse bit-operations and cost-favorable computations. Beyond inference, we take another leap to embrace energy-efficient training, by introducing several customized techniques to address the unique roadblocks arising in training while preserving the SmartDeal structures. We also design a dedicated hardware accelerator to fully utilize the new weight structure to improve the real energy efficiency and latency performance. We conduct experiments on both vision and language tasks, with nine models, four datasets, and three settings (inference-only, adaptation, and fine-tuning). Our extensive results show that 1) being applied to inference, SmartDeal achieves up to 2.44× improvement in energy efficiency as evaluated using real hardware implementations and 2) being applied to training, SmartDeal can lead to 10.56× and 4.48× reduction in the storage and the training energy cost, respectively, with usually negligible accuracy loss, compared to state-of-the-art training baselines. Our source codes are available at: https://github.com/VITA-Group/SmartDeal.

SmartLite: A DBMS-Based Serving System for DNN Inference in Resource-Constrained Environments

Smart-DNN: Efficiently Reducing the Memory Requirements of Running Deep Neural Networks on Resource-constrained Platforms

MBSNN: A Multi-Branch Scalable Neural Network for Resource-Constrained IoT Devices

Smart-DNN+: A Memory-efficient Neural Networks Compression Framework for the Model Inference

Enabling Deep Learning on Edge Devices

Toward Collaborative Inferencing of Deep Neural Networks on Internet-of-Things Devices

Memory-Efficient and Secure DNN Inference on TrustZone-enabled Consumer IoT Devices

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

Edge-Cloud Cooperation for DNN Inference Via Reinforcement Learning and Supervised Learning

Deploy Large-Scale Deep Neural Networks in Resource Constrained IoT Devices with Local Quantization Region

DNNOff: Offloading DNN-Based Intelligent IoT Applications in Mobile Edge Computing

EdgeKE: An On-Demand Deep Learning IoT System for Cognitive Big Data on Industrial Edge Devices

Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware

DNN Model Compression for IoT Domain-Specific Hardware Accelerators

SmartDeal: Remodeling Deep Network Weights for Efficient Inference and Training

SMaLL: A Software Framework for portable Machine Learning Libraries

Minimizing Latency for Multi-DNN Inference on Resource-Limited CPU-Only Edge Devices

Resource-Efficient Distributed Deep Neural Networks Empowered by Intelligent Software-Defined Networking.

Enabling High Performance Deep Learning Networks on Embedded Systems

Energy-efficient Deployment of Deep Learning Applications on Cortex-M based Microcontrollers using Deep Compression

Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing