DNN-Compressed Domain Visual Recognition with Feature Adaptation

Yingpeng Deng,Lina J. Karam
2023-07-26
Abstract:Learning-based image compression was shown to achieve a competitive performance with state-of-the-art transform-based codecs. This motivated the development of new learning-based visual compression standards such as JPEG-AI. Of particular interest to these emerging standards is the development of learning-based image compression systems targeting both humans and machines. This paper is concerned with learning-based compression schemes whose compressed-domain representations can be utilized to perform visual processing and computer vision tasks directly in the compressed domain. In our work, we adopt a learning-based compressed-domain classification framework for performing visual recognition using the compressed-domain latent representation at varying bit-rates. We propose a novel feature adaptation module integrating a lightweight attention model to adaptively emphasize and enhance the key features within the extracted channel-wise information. Also, we design an adaptation training strategy to utilize the pretrained pixel-domain weights. For comparison, in addition to the performance results that are obtained using our proposed latent-based compressed-domain method, we also present performance results using compressed but fully decoded images in the pixel domain as well as original uncompressed images. The obtained performance results show that our proposed compressed-domain classification model can distinctly outperform the existing compressed-domain classification models, and that it can also yield similar accuracy results with a much higher computational efficiency as compared to the pixel-domain models that are trained using fully decoded images.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to directly perform visual recognition and computer vision tasks in the compressed domain, especially image classification. Specifically, the author focuses on learning - based compression schemes, whose compressed - domain representations can be used to directly perform visual processing and computer vision tasks in the compressed domain without fully decoding the image. This can not only save computing resources but also improve efficiency, especially in the case of high compression ratios. The main contribution of the paper lies in proposing a new Feature Adaptation module, which integrates a lightweight attention model to adaptively emphasize and enhance the key features in the extracted channel information. In addition, the author also designs an adaptive training strategy, using pre - trained pixel - domain weights to improve the performance of the compressed - domain classification model. Verified by experiments, the proposed compressed - domain classification model is not only significantly better than the existing compressed - domain classification models, but also has significantly higher computational efficiency than the pixel - domain models trained using fully decoded images, while being able to achieve similar accuracy. The key technical points mentioned in the paper include: - **Learning - based Compression Model**: A deep - learning - based image compression algorithm that can compete with traditional compression methods in compression efficiency and can support image processing and computer vision tasks in the compressed domain. - **Feature Adaptation Module**: It consists of two parts - the Channel - wise Attention Unit (CAU) and the Feature Enhancement Unit (FEU). The CAU aims to learn affine transformation vectors at the channel level, select and realign the compressed - domain channels; the FEU enhances the useful features within the selected channels through the cross - channel convolutional layer. - **Adaptive Training Strategy**: Utilize the weights of the pre - trained pixel - domain model, freeze the adopted weights, only update the inserted FA module parameters, and then unfreeze and fine - tune the entire network for compressed - domain recognition. Through these innovations, the paper provides an effective method for efficient visual recognition in the compressed domain, which is of great significance for real - time applications and resource - constrained devices.