Abstract:Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned with learning-based compression schemes whose compressed-domain representations can be utilized to perform visual processing and computer vision tasks directly in the compressed domain. In our work, we adopt a learning-based compressed-domain classification framework for performing visual recognition using the compressed-domain latent representation at varying bit-rates. We propose a novel feature adaptation module integrating a lightweight attention model to adaptively emphasize and enhance the key features within the extracted channel-wise information. Also, we design an adaptation training strategy to utilize the pretrained pixel-domain weights. For comparison, in addition to the performance results that are obtained using our proposed latent-based compressed-domain method, we also present performance results using compressed but fully decoded images in the pixel domain as well as original uncompressed images. The obtained performance results show that our proposed compressed-domain classification model can distinctly outperform the existing compressed-domain classification models, and that it can also yield similar accuracy results with a much higher computational efficiency as compared to the pixel-domain models that are trained using fully decoded images.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to directly perform visual recognition and computer vision tasks in the compressed domain, especially image classification. Specifically, the author focuses on learning - based compression schemes, whose compressed - domain representations can be used to directly perform visual processing and computer vision tasks in the compressed domain without fully decoding the image. This can not only save computing resources but also improve efficiency, especially in the case of high compression ratios. The main contribution of the paper lies in proposing a new Feature Adaptation module, which integrates a lightweight attention model to adaptively emphasize and enhance the key features in the extracted channel information. In addition, the author also designs an adaptive training strategy, using pre - trained pixel - domain weights to improve the performance of the compressed - domain classification model. Verified by experiments, the proposed compressed - domain classification model is not only significantly better than the existing compressed - domain classification models, but also has significantly higher computational efficiency than the pixel - domain models trained using fully decoded images, while being able to achieve similar accuracy. The key technical points mentioned in the paper include: - **Learning - based Compression Model**: A deep - learning - based image compression algorithm that can compete with traditional compression methods in compression efficiency and can support image processing and computer vision tasks in the compressed domain. - **Feature Adaptation Module**: It consists of two parts - the Channel - wise Attention Unit (CAU) and the Feature Enhancement Unit (FEU). The CAU aims to learn affine transformation vectors at the channel level, select and realign the compressed - domain channels; the FEU enhances the useful features within the selected channels through the cross - channel convolutional layer. - **Adaptive Training Strategy**: Utilize the weights of the pre - trained pixel - domain model, freeze the adopted weights, only update the inserted FA module parameters, and then unfreeze and fine - tune the entire network for compressed - domain recognition. Through these innovations, the paper provides an effective method for efficient visual recognition in the compressed domain, which is of great significance for real - time applications and resource - constrained devices.

DNN-Compressed Domain Visual Recognition with Feature Adaptation

Learned Image Compression for Both Humans and Machines Via Dynamic Adaptation

Analysis on Compressed Domain: A Multi-Task Learning Approach

Adaptive Compression for Online Computer Vision: an Edge Reinforcement Learning Approach

2C-Net: integrate image compression and classification via deep neural network

Unified Architecture Adaptation for Compressed Domain Semantic Inference

AdaCompress: Adaptive Compression for Online Computer Vision Services

Dynamic Low-Rank Instance Adaptation for Universal Neural Image Compression

Learned Image Compression for Machine Perception

A Unified Efficient Deep Image Compression Framework and Its Application on Human-Centric Task

Deep learning-based Edge-aware pre and post-processing methods for JPEG compressed images

Learned image and video compression with deep neural networks

DeepN-JPEG: A Deep Neural Network Favorable JPEG-based Image Compression Framework

Exploring Compressed Image Representation as a Perceptual Proxy: A Study

A Unified End-to-End Framework for Efficient Deep Image Compression

Deep Learning for Visual Data Compression

Toward Scalable Image Feature Compression: A Content-Adaptive and Diffusion-Based Approach

Color Learning for Image Compression

Region-of-interest and channel attention-based joint optimization of image compression and computer vision

Multiscale Progressive Image Compression Network Guided by Learnable Just Noticeable Distortion

Recognition-Aware Learned Image Compression