Abstract:Image coding for multi-task applications, catering to both human perception and machine vision, has been extensively investigated. Existing methods often rely on multiple task-specific encoder-decoder pairs, leading to high overhead of parameter and bitrate usage, or face challenges in multi-objective optimization under a unified representation, failing to achieve both performance and efficiency. To this end, we propose Multi-Path Aggregation (MPA) integrated into existing coding models for joint human-machine vision, unifying the feature representation with an all-in-one architecture. MPA employs a predictor to allocate latent features among task-specific paths based on feature importance varied across tasks, maximizing the utility of shared features while preserving task-specific features for subsequent refinement. Leveraging feature correlations, we develop a two-stage optimization strategy to alleviate multi-task performance degradation. Upon the reuse of shared features, as low as 1.89% parameters are further augmented and fine-tuned for a specific task, which completely avoids extensive optimization of the entire model. Experimental results show that MPA achieves performance comparable to state-of-the-art methods in both task-specific and multi-objective optimization across human viewing and machine analysis tasks. Moreover, our all-in-one design supports seamless transitions between human- and machine-oriented reconstruction, enabling task-controllable interpretation without altering the unified model. Code is available at <a class="link-external link-https" href="https://github.com/NJUVISION/MPA" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the problem of image encoding meeting the needs of human perception and machine vision simultaneously in multi - task applications. Specifically, existing methods usually rely on multiple encoder - decoder pairs for specific tasks, which leads to high overhead in parameter and bit - rate usage, or multi - objective optimization under a unified representation faces challenges and cannot achieve performance and efficiency simultaneously. To address these challenges, the authors propose the **Multi - Path Aggregation (MPA)** method and integrate it into existing encoding models to achieve joint human and machine vision. #### Main problems: 1. **High parameter and bit - rate overhead**: Existing methods rely on multiple encoder - decoder pairs for specific tasks, resulting in excessive use of parameters and bit - rates. 2. **Difficulty in multi - objective optimization**: When performing multi - task optimization under a unified representation, it is difficult to balance the requirements of different tasks, leading to a decline in performance. 3. **Lack of flexibility**: Existing methods are difficult to switch flexibly between different tasks and maintain efficient task - specific optimization. #### Solutions: - **Multi - Path Aggregation (MPA)**: By introducing multiple paths to process the features of different tasks and assigning them to different paths according to the importance of the features, the utilization rate of shared features is maximized while task - specific features are retained for subsequent refinement. - **Two - stage optimization strategy**: Utilizing feature correlation, a two - stage optimization strategy is developed, which avoids extensive optimization of the entire model and only requires fine - tuning of some parameters. - **Task - controllable interpretability**: It supports seamless switching between human - and machine - oriented reconstructions without changing the unified model, allowing a single representation to be interpreted in different ways. Through these improvements, MPA can achieve performance comparable to existing state - of - the - art methods in multi - task encoding, while significantly reducing the parameter and bit - rate overhead and improving the efficiency of multi - task collaboration.

All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation

A Unified Image Compression Method for Human Perception and Multiple Vision Tasks

Towards Coding for Human and Machine Vision: A Scalable Image Coding Approach

Slimmable Multi-Task Image Compression for Human and Machine Vision

Towards Coding for Human and Machine Vision: Scalable Face Image Coding

Deep Image Compression Towards Machine Vision: A Unified Optimization Framework

Deep Image Compression Toward Machine Vision: A Unified Optimization Framework

Video Coding for Machines: Compact Visual Representation Compression for Intelligent Collaborative Analytics

Scalable image coding with enhancement features for human and machine

Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics

Multi-view Video Coding Based on View Prediction

Collaborative Scalable Visual Compression for Human-Centered Videos.

Task-Switchable Pre-Processor for Image Compression for Multiple Machine Vision Tasks

Rethinking the Joint Optimization in Video Coding for Machines: A Case Study

Scalable Face Image Coding via StyleGAN Prior: Towards Compression for Human-Machine Collaborative Vision

Scalable Face Image Coding via StyleGAN Prior: Toward Compression for Human-Machine Collaborative Vision

End-to-End Learned Scalable Multilayer Feature Compression for Machine Vision Tasks

Preprocessing Enhanced Image Compression for Machine Vision

An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond Feature and Signal

Sketch Assisted Face Image Coding for Human and Machine Vision: A Joint Training Approach

Facial Image Compression via Neural Image Manifold Compression