Feature Compression for Cloud-Edge Multimodal 3D Object Detection

Chongzhen Tian,Zhengxin Li,Hui Yuan,Raouf Hamzaoui,Liquan Shen,Sam Kwong

2024-09-06

Abstract:Machine vision systems, which can efficiently manage extensive visual perception tasks, are becoming increasingly popular in industrial production and daily life. Due to the challenge of simultaneously obtaining accurate depth and texture information with a single sensor, multimodal data captured by cameras and LiDAR is commonly used to enhance performance. Additionally, cloud-edge cooperation has emerged as a novel computing approach to improve user experience and ensure data security in machine vision systems. This paper proposes a pioneering solution to address the feature compression problem in multimodal 3D object detection. Given a sparse tensor-based object detection network at the edge device, we introduce two modes to accommodate different application requirements: Transmission-Friendly Feature Compression (T-FFC) and Accuracy-Friendly Feature Compression (A-FFC). In T-FFC mode, only the output of the last layer of the network's backbone is transmitted from the edge device. The received feature is processed at the cloud device through a channel expansion module and two spatial upsampling modules to generate multi-scale features. In A-FFC mode, we expand upon the T-FFC mode by transmitting two additional types of features. These added features enable the cloud device to generate more accurate multi-scale features. Experimental results on the KITTI dataset using the VirConv-L detection network showed that T-FFC was able to compress the features by a factor of 6061 with less than a 3% reduction in detection performance. On the other hand, A-FFC compressed the features by a factor of about 901 with almost no degradation in detection performance. We also designed optional residual extraction and 3D object reconstruction modules to facilitate the reconstruction of detected objects. The reconstructed objects effectively reflected details of the original objects.

Image and Video Processing

What problem does this paper attempt to address?

The paper aims to address the issue of feature compression in multimodal 3D object detection. Specifically, the researchers propose two feature compression modes to suit different application needs: 1. **Transmission-Friendly Feature Compression (T-FFC)**: - In this mode, only the output of the last layer of the network backbone is transmitted from the edge device. - The received features are processed by a channel expansion module and two spatial upsampling modules on the cloud device to generate multi-scale features. - Experimental results show that T-FFC can compress features by 6061 times while reducing detection performance by less than 3%. 2. **Accuracy-Friendly Feature Compression (A-FFC)**: - This mode adds two additional types of feature transmission on top of T-FFC. - The additional features enable the cloud device to generate more accurate multi-scale features, achieving approximately 901 times feature compression with almost no loss in detection performance. Additionally, the study designs optional residual extraction and 3D object reconstruction modules to facilitate the reconstruction of detected objects. These reconstructed objects effectively reflect the shape, occlusion, and detail information of the original objects. The focus of the paper is on how to efficiently compress features of multimodal data (image and point cloud data) to support collaboration between edge computing and cloud computing, and to reduce the pressure of data transmission.

Feature Compression for Cloud-Edge Multimodal 3D Object Detection

An Efficient Compressive Convolutional Network for Unified Object Detection and Image Compression

SSF: Sparse Point Cloud Object Detection Based on Self-Adaptive Voxel Encoding and Focal-Sparse Convolution

Masked Feature Compression for Object Detection

Joint Optimized Point Cloud Compression for 3d Object Detection

Feature Compression for Rate Constrained Object Detection on the Edge

Three-Dimensional Point Cloud Object Detection Based on Feature Fusion and Enhancement

Point-Voxel Fusion for 3D Object Detection

FFPA-Net: Efficient Feature Fusion with Projection Awareness for 3D Object Detection

Video Feature Compression for Machine Tasks

Impact of LiDAR point cloud compression on 3D object detection evaluated on the KITTI dataset

Efficient Feature Compression for Edge-Cloud Systems

Flexible Variable-Rate Image Feature Compression for Edge-Cloud Systems

Edge-Cloud Collaborated Object Detection via Difficult-Case Discriminator

3D Object Detection Based on Attention and Multi-Scale Feature Fusion

EdgeCompress: Coupling Multidimensional Model Compression and Dynamic Inference for EdgeAI

Accelerated Inference of Face Detection under Edge-Cloud Collaboration

Context-Aware Dynamic Feature Extraction for 3D Object Detection in Point Clouds

PCDR-DFF: multi-modal 3D object detection based on point cloud diversity representation and dual feature fusion

Cascaded Cross-Modality Fusion Network for 3D Object Detection

Toward Intelligent Sensing: Intermediate Deep Feature Compression