Abstract:Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption. Existing approaches typically separate the DNN model development step from its deployment on IoT devices, resulting in suboptimal solutions. In this paper, we first introduce a few interesting but counterintuitive observations about such a separate design approach, and empirically show why it may lead to suboptimal designs. Motivated by these observations, we then propose a novel and practical bi-directional co-design approach: a bottom-up DNN model design strategy together with a top-down flow for DNN accelerator design. It enables a joint optimization of both DNN models and their deployment configurations on IoT devices as represented as FPGAs. We demonstrate the effectiveness of the proposed co-design approach on a real-life object detection application using Pynq-Z1 embedded FPGA. Our method obtains the state-of-the-art results on both QoR with high accuracy (IoU) and QoS with high throughput (FPS) and high energy efficiency.
What problem does this paper attempt to address?
This paper attempts to address the challenges faced in deploying deep - learning models on resource - constrained Internet of Things (IoT) devices. Specifically, it aims to simultaneously optimize the Quality - of - Result (QoR) of deep neural network (DNN) models, such as model inference accuracy, and Quality - of - Service (QoS), such as inference latency, throughput, and power consumption. Existing methods usually separate the DNN model development steps from its deployment on IoT devices, which may lead to sub - optimal solutions. The paper solves this problem by introducing a new two - way co - design method, which combines a bottom - up DNN model design strategy with a top - down DNN accelerator design process, thereby achieving joint optimization of the DNN model and its deployment configuration on IoT devices.
The main problems mentioned in the paper include:
1. **Defects of independent design methods**: Traditional DNN model design and hardware accelerator design are usually carried out separately, and this separation leads to sub - optimal solutions in design. For example, a DNN model may be too complex and need to be compressed by means of quantization, network pruning or sparsification before it can be implemented on hardware, and these compression operations will affect the inference accuracy of the model.
2. **Challenges of hardware/software configuration**: Configuration parameters such as DNN model size and hardware utilization have different impacts on DNN and accelerators, and it is difficult to balance these configurations to achieve the best QoR and QoS. For example, even if the model compression ratios are similar, different DNN component compressions may lead to significantly different QoR; likewise, even a slight difference in hardware resource usage may lead to significantly different QoS.
3. **Confusion of QoR upper limits for given tasks**: When deploying DNN on IoT devices, a DNN with the required QoR upper limit is usually found first, and then it is pruned to make up for the QoS loss on hardware. However, complex DNNs do not always provide higher QoR than simple DNNs, indicating that the current separate design methods may only achieve sub - optimal solutions.
To solve the above problems, the paper proposes a two - way co - design method, which pursues the best balance between QoS and QoR by considering the design of DNN models and hardware accelerators simultaneously. The method includes three main steps:
1. **Constructing and evaluating bundles**: Randomly select DNN layers and construct different "bundles", each of which is a basic building block of the generated DNN. Use an analytical model to evaluate the hardware characteristics of each bundle, such as latency, computation and memory requirements, and resource utilization, in order to estimate QoS at an early stage.
2. **Bundle selection based on QoR and QoS**: Evaluate the QoR potential of each bundle, group them according to QoS estimates, and select the bundles most likely to meet the target.
3. **Hardware - aware DNN exploration**: Explore DNNs under given QoS and QoR constraints by stacking the selected bundles using a bottom - up method. Optimize the DNN model using the Stochastic Coordinate Descent (SCD) method, and the generated DNN model is accurately evaluated and then fed back to SCD for updating until the QoS target is met.
Through this method, the paper shows that in the object detection application implemented on the Pynq - Z1 embedded FPGA, the proposed co - design method can obtain state - of - the - art QoR and QoS results.