Abstract:Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption. Existing approaches typically separate the DNN model development step from its deployment on IoT devices, resulting in suboptimal solutions. In this paper, we first introduce a few interesting but counterintuitive observations about such a separate design approach, and empirically show why it may lead to suboptimal designs. Motivated by these observations, we then propose a novel and practical bi-directional co-design approach: a bottom-up DNN model design strategy together with a top-down flow for DNN accelerator design. It enables a joint optimization of both DNN models and their deployment configurations on IoT devices as represented as FPGAs. We demonstrate the effectiveness of the proposed co-design approach on a real-life object detection application using Pynq-Z1 embedded FPGA. Our method obtains the state-of-the-art results on both QoR with high accuracy (IoU) and QoS with high throughput (FPS) and high energy efficiency.

What problem does this paper attempt to address?

This paper attempts to address the challenges faced in deploying deep - learning models on resource - constrained Internet of Things (IoT) devices. Specifically, it aims to simultaneously optimize the Quality - of - Result (QoR) of deep neural network (DNN) models, such as model inference accuracy, and Quality - of - Service (QoS), such as inference latency, throughput, and power consumption. Existing methods usually separate the DNN model development steps from its deployment on IoT devices, which may lead to sub - optimal solutions. The paper solves this problem by introducing a new two - way co - design method, which combines a bottom - up DNN model design strategy with a top - down DNN accelerator design process, thereby achieving joint optimization of the DNN model and its deployment configuration on IoT devices. The main problems mentioned in the paper include: 1. **Defects of independent design methods**: Traditional DNN model design and hardware accelerator design are usually carried out separately, and this separation leads to sub - optimal solutions in design. For example, a DNN model may be too complex and need to be compressed by means of quantization, network pruning or sparsification before it can be implemented on hardware, and these compression operations will affect the inference accuracy of the model. 2. **Challenges of hardware/software configuration**: Configuration parameters such as DNN model size and hardware utilization have different impacts on DNN and accelerators, and it is difficult to balance these configurations to achieve the best QoR and QoS. For example, even if the model compression ratios are similar, different DNN component compressions may lead to significantly different QoR; likewise, even a slight difference in hardware resource usage may lead to significantly different QoS. 3. **Confusion of QoR upper limits for given tasks**: When deploying DNN on IoT devices, a DNN with the required QoR upper limit is usually found first, and then it is pruned to make up for the QoS loss on hardware. However, complex DNNs do not always provide higher QoR than simple DNNs, indicating that the current separate design methods may only achieve sub - optimal solutions. To solve the above problems, the paper proposes a two - way co - design method, which pursues the best balance between QoS and QoR by considering the design of DNN models and hardware accelerators simultaneously. The method includes three main steps: 1. **Constructing and evaluating bundles**: Randomly select DNN layers and construct different "bundles", each of which is a basic building block of the generated DNN. Use an analytical model to evaluate the hardware characteristics of each bundle, such as latency, computation and memory requirements, and resource utilization, in order to estimate QoS at an early stage. 2. **Bundle selection based on QoR and QoS**: Evaluate the QoR potential of each bundle, group them according to QoS estimates, and select the bundles most likely to meet the target. 3. **Hardware - aware DNN exploration**: Explore DNNs under given QoS and QoR constraints by stacking the selected bundles using a bottom - up method. Optimize the DNN model using the Stochastic Coordinate Descent (SCD) method, and the generated DNN model is accurately evaluated and then fed back to SCD for updating until the QoS target is met. Through this method, the paper shows that in the object detection application implemented on the Pynq - Z1 embedded FPGA, the proposed co - design method can obtain state - of - the - art QoR and QoS results.

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

Design an Efficient DNN Inference Framework with PS-PL Synergies in FPGA for Edge Computing

Resource-constrained FPGA/DNN co-design

On designing the adaptive computation framework of distributed deep learning models for Internet-of-Things applications

SECDA: Efficient Hardware/Software Co-Design of FPGA-based DNN Accelerators for Edge Inference

Invited: Algorithm-Software-Hardware Co-Design for Deep Learning Acceleration

A Novel Automate Python Edge-to-Edge: From Automated Generation on Cloud to User Application Deployment on Edge of Deep Neural Networks for Low Power IoT Systems FPGA-Based Acceleration

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

A Power Efficient Neural Network Implementation on Heterogeneous FPGA and GPU Devices

Deploy Large-Scale Deep Neural Networks in Resource Constrained IoT Devices with Local Quantization Region

Challenges in Energy-Efficient Deep Neural Network Training with FPGA.

Latency optimized Deep Neural Networks (DNNs): An Artificial Intelligence approach at the Edge using Multiprocessor System on Chip (MPSoC)

AdaInNet: an adaptive inference engine for distributed deep neural networks offloading in IoT-FOG applications based on reinforcement learning

Enabling Deep Learning on IoT Edge: Approaches and Evaluation

Edge Intelligence: On-Demand Deep Learning Model Co-Inference with Device-Edge Synergy

DeepThings: Distributed Adaptive Deep Learning Inference on Resource-Constrained IoT Edge Clusters

Power-Driven DNN Dataflow Optimization on FPGA

Toward Decentralized and Collaborative Deep Learning Inference for Intelligent IoT Devices

Adaptive Device-Edge Collaboration on DNN Inference in AIoT: A Digital Twin-Assisted Approach