Abstract:Edge devices are seeing tremendous growth in sensing and computational capabilities. Running state-of-the-art deep neural network (NN) based data processing on multi-core CPU processors, embedded Graphics Processing Units (GPU), Tensor Processing Units (TPU), Neural Processing Units (NPU), Deep Learning Accelerators (DLA) etc., edge devices are now able to handle heavy data computations with limited or without cloud connectivity. In addition to hardware resources, software frameworks that optimize a trained neural network (NN) model through weight clustering and pruning, weight and input-output quantization to fewer bits, fusing NN layers etc., for more efficient execution of NN inferences on edge platforms, play an important role in making machine learning at the edge (namely EdgeML) a reality. This paper is a first effort in characterizing these software frameworks for DNN inference optimizations on edge devices, especially edge GPUs which are now ubiquitously used in all embedded deep learning systems. The interactions between software optimizations and the underlying GPU hardware is carefully examined. As most NN optimization engines are proprietary softwares with undocumented internal details in the public domain, our empirical analysis on real embedded GPU platforms using a variety of widely used DNNs, provide various interesting findings. We observe tremendous performance gain and non-negligible accuracy gain from the software optimizations, but also find highly unexpected non-deterministic behaviors such as different outputs on same inputs or increased execution latency for same NN model on more powerful hardware platforms. Application developers using these proprietary software optimization engines, would benefit from our analysis and the discussed implications of our findings, with examples from real applications like intelligent traffic intersection control and Advanced Driving Assistance Systems (ADAS). There are important implications of our findings on performance modeling and prediction research too, that focus on micro-architecture modeling based application performance prediction, but should now additionally consider optimization engines that this paper examines.

AI Tax: The Hidden Cost of AI Data Center Applications

AI-oriented Workload Allocation for Cloud-Edge Computing.

AIBench: Towards Scalable and Comprehensive Datacenter AI Benchmarking

AI on the Edge: Rethinking AI-based IoT Applications Using Specialized Edge Architectures

Optimizing Cloud Infrastructure for Real-time AI Processing: Challenges and Solutions

Sustainable AI Processing at the Edge

Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs

Efficient Hardware Acceleration Techniques for Deep Learning on Edge Devices: A Comprehensive Performance Analysis

Reaching for the Sky: Maximizing Deep Learning Inference Throughput on Edge Devices with AI Multi-Tenancy

Accelerated Cloud for Artificial Intelligence (ACAI)

AI Multi-Tenancy on Edge: Concurrent Deep Learning Model Executions and Dynamic Model Placements on Edge Devices

Power Hungry Processing: Watts Driving the Cost of AI Deployment?

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration

Trends in Energy Estimates for Computing in AI/Machine Learning Accelerators, Supercomputers, and Compute-Intensive Applications

Analysing Edge Computing Devices for the Deployment of Embedded AI

The Unpaid Toll: Quantifying the Public Health Impact of AI

Empirical Measurements of AI Training Power Demand on a GPU-Accelerated Node

Hardware Accelerators for Artificial Intelligence

AI on the edge: a comprehensive review

Demystifying TensorRT: Characterizing Neural Network Inference Engine on Nvidia Edge Devices