Abstract:This paper presents a benchmark analysis of NVIDIA Jetson platforms when operating deep learning-based 3D object detection frameworks. Three-dimensional (3D) object detection could be highly beneficial for the autonomous navigation of robotic platforms, such as autonomous vehicles, robots, and drones. Since the function provides one-shot inference that extracts 3D positions with depth information and the heading direction of neighboring objects, robots can generate a reliable path to navigate without collision. To enable the smooth functioning of 3D object detection, several approaches have been developed to build detectors using deep learning for fast and accurate inference. In this paper, we investigate 3D object detectors and analyze their performance on the NVIDIA Jetson series that contain an onboard graphical processing unit (GPU) for deep learning computation. Since robotic platforms often require real-time control to avoid dynamic obstacles, onboard processing with a built-in computer is an emerging trend. The Jetson series satisfies such requirements with a compact board size and suitable computational performance for autonomous navigation. However, a proper benchmark that analyzes the Jetson for a computationally expensive task, such as point cloud processing, has not yet been extensively studied. In order to examine the Jetson series for such expensive tasks, we tested the performance of all commercially available boards (i.e., Nano, TX2, NX, and AGX) with state-of-the-art 3D object detectors. We also evaluated the effect of the TensorRT library to optimize a deep learning model for faster inference and lower resource utilization on the Jetson platforms. We present benchmark results in terms of three metrics, including detection accuracy, frame per second (FPS), and resource usage with power consumption. From the experiments, we observe that all Jetson boards, on average, consume over 80% of GPU resources. Moreover, TensorRT could remarkably increase inference speed (i.e., four times faster) and reduce the central processing unit (CPU) and memory consumption in half. By analyzing such metrics in detail, we establish research foundations on edge device-based 3D object detection for the efficient operation of various robotic applications.

TensorRT-based Framework and Optimization Methodology for Deep Learning Inference on Jetson Boards

High-performance Reconfigurable DNN Accelerator on a Bandwidth-limited Embedded System

A Deep Learning Framework Performance Evaluation to Use YOLO in Nvidia Jetson Platform

ACCELERATION OF TRANSFORMER ARCHITECTURES ON JETSON XAVIER USING TENSORRT

Efficient CUDA stream management for multi-DNN real-time inference on embedded GPUs

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

TensorRT Powered Model for Ultra-Fast Li-Ion Battery Capacity Prediction on Embedded Devices

Demystifying TensorRT: Characterizing Neural Network Inference Engine on Nvidia Edge Devices

Run Your 3D Object Detector on NVIDIA Jetson Platforms:A Benchmark Analysis

Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded Devices

A High-Performance Dataflow-Centric Optimization Framework for Deep Learning Inference on the Edge

Optimizing Monocular Driving Assistance for Real-Time Processing on Jetson AGX Xavier

Reaching for the Sky: Maximizing Deep Learning Inference Throughput on Edge Devices with AI Multi-Tenancy

An Optimization Toolchain Design Of Deep Learning Deployment Based On Heterogeneous Computing Platform

High performance and energy efficient inference for deep learning on ARM processors

Incremental Training and Group Convolution Pruning for Runtime DNN Performance Scaling on Heterogeneous Embedded Platforms

A Fine-Grained End-to-End Latency Optimization Framework for Wireless Collaborative Inference

Joint Architecture Design and Workload Partitioning for DNN Inference on Industrial IoT Clusters

Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System

Empowering In-Browser Deep Learning Inference on Edge Through Just-In-Time Kernel Optimization.