Abstract:Deep neural networks (DNNs), as the basis of object detection, will play a key role in the development of future autonomous systems with full autonomy. The autonomous systems have special requirements of real-time, energy-e cient implementations of DNNs on a power-budgeted system. Two research thrusts are dedicated to per- formance and energy e ciency enhancement of the inference phase of DNNs. The first one is model compression techniques while the second is e cient hardware implementations. Recent researches on extremely-low-bit CNNs such as binary neural network (BNN) and XNOR-Net replace the traditional oating point operations with bi- nary bit operations, signi cantly reducing memory bandwidth and storage requirement, whereas suffering non-negligible accuracy loss and waste of digital signal processing (DSP) blocks on FPGAs. To overcome these limitations, this paper proposes REQ-YOLO, a resource aware, systematic weight quantization framework for object detection, considering both algorithm and hardware resource aspects in object detection. We adopt the block-circulant matrix method and propose a heterogeneous weight quantization using Alternative Direction Method of Multipliers (ADMM), an e ective optimization technique for general, non-convex optimization problems. To achieve real-time, highly efficient implementations on FPGA, we present the detailed hardware implementation of block circulant matrices on CONV layers and de- velop an e cient processing element (PE) structure supporting the heterogeneous weight quantization, CONV data ow and pipelining techniques, design optimization, and a template-based automatic synthesis framework to optimally exploit hardware resource. Experimental results show that our proposed REQ-YOLO framework can signi cantly compress the YOLO model while introducing very small accuracy degradation. The related codes are here: https://github.com/Anonymous788/heterogeneous_ADMM_YOLO.

Differential Image-Based Scalable YOLOv7-Tiny Implementation for Clustered Embedded Systems

Octave-YOLO: Cross frequency detection network with octave convolution

LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection

YOLO-DCNet

YOLObile: Real-Time Object Detection on Mobile Devices via Compression-Compilation Co-Design

YOLOv4-dense: A smaller and faster YOLOv4 for real-time edge-device based object detection in traffic scene

Developing a Compressed Object Detection Model based on YOLOv4 for Deployment on Embedded GPU Platform of Autonomous System

A dedicated hardware accelerator for real-time acceleration of YOLOv2

Feature Compression for Rate Constrained Object Detection on the Edge

A CNN Hardware Accelerator Designed for YOLO Algorithm Based on RISC-V SoC

Scaled-YOLOv4: Scaling Cross Stage Partial Network

Rotating Kernel CNN Optimization for Efficient IoT Surveillance on Low-Power Devices

Real-time object detection method based on YOLOv5 and efficient mobile network

DCEF2-YOLO: Aerial Detection YOLO with Deformable Convolution–Efficient Feature Fusion for Small Target Detection

HV-YOLOv8 by HDPconv: Better lightweight detectors for small object detection

Development of a Productive Transport Detection System Using Convolutional Neural Networks

REQ-YOLO

YOLOv4: Optimal Speed and Accuracy of Object Detection

YOLOv10: Real-Time End-to-End Object Detection

Compressing YOLO Network by Compressive Sensing

Object Detection Edge Performance Optimization on FPGA-Based Heterogeneous Multiprocessor Systems