Abstract:The capabilities and adoption of deep neural networks (DNNs) grow at an exhilarating pace: Vision models accurately classify human actions in videos and identify cancerous tissue in medical scans as precisely than human experts; large language models answer wide-ranging questions, generate code, and write prose, becoming the topic of everyday dinner-table conversations. Even though their uses are exhilarating, the continually increasing model sizes and computational complexities have a dark side. The economic cost and negative environmental externalities of training and serving models is in evident disharmony with financial viability and climate action goals. Instead of pursuing yet another increase in predictive performance, this dissertation is dedicated to the improvement of neural network efficiency. Specifically, a core contribution addresses the efficiency aspects during online inference. Here, the concept of Continual Inference Networks (CINs) is proposed and explored across four publications. CINs extend prior state-of-the-art methods developed for offline processing of spatio-temporal data and reuse their pre-trained weights, improving their online processing efficiency by an order of magnitude. These advances are attained through a bottom-up computational reorganization and judicious architectural modifications. The benefit to online inference is demonstrated by reformulating several widely used network architectures into CINs, including 3D CNNs, ST-GCNs, and Transformer Encoders. An orthogonal contribution tackles the concurrent adaptation and computational acceleration of a large source model into multiple lightweight derived models. Drawing on fusible adapter networks and structured pruning, Structured Pruning Adapters achieve superior predictive accuracy under aggressive pruning using significantly fewer learned weights compared to fine-tuning with pruning.

Online Learning to Accelerate Neural Network Inference with Traveling Classifiers.

AccEPT: an Acceleration Scheme for Speeding Up Edge Pipeline-parallel Training

Accelerating Very Deep Convolutional Networks for Classification and Detection

Event-Based Control for Online Training of Neural Networks

An efficient approach to escalate the speed of training convolution neural networks

Runtime Network Routing for Efficient Image Classification.

A Fine-grained Sparse Neural Network Accelerator for Image Classification.

Accelerating Neural Network Training: A Brief Review

Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification

Efficient Online Processing with Deep Neural Networks

A Framework for Fast Scalable BNN Inference using Googlenet and Transfer Learning

Network Expansion for Practical Training Acceleration

DRnet: Dynamic Retraining for Malicious Traffic Small-Sample Incremental Learning

Neural Group Testing to Accelerate Deep Learning

DecTrain: Deciding When to Train a DNN Online

Speeding Up Image Classifiers with Little Companions

Latency-Aware Unified Dynamic Networks for Efficient Image Recognition

A Progressive Subnetwork Searching Framework for Dynamic Inference

Resolution Adaptive Networks for Efficient Inference

Adaptive Network Configuration for Efficient and Accurate Neural Video Inference

AutoAssist: A Framework to Accelerate Training of Deep Neural Networks