All-in-One Image Coding for Joint Human-Machine Vision with Multi-Path Aggregation

Xu Zhang,Peiyao Guo,Ming Lu,Zhan Ma
2024-09-29
Abstract:Image coding for multi-task applications, catering to both human perception and machine vision, has been extensively investigated. Existing methods often rely on multiple task-specific encoder-decoder pairs, leading to high overhead of parameter and bitrate usage, or face challenges in multi-objective optimization under a unified representation, failing to achieve both performance and efficiency. To this end, we propose Multi-Path Aggregation (MPA) integrated into existing coding models for joint human-machine vision, unifying the feature representation with an all-in-one architecture. MPA employs a predictor to allocate latent features among task-specific paths based on feature importance varied across tasks, maximizing the utility of shared features while preserving task-specific features for subsequent refinement. Leveraging feature correlations, we develop a two-stage optimization strategy to alleviate multi-task performance degradation. Upon the reuse of shared features, as low as 1.89% parameters are further augmented and fine-tuned for a specific task, which completely avoids extensive optimization of the entire model. Experimental results show that MPA achieves performance comparable to state-of-the-art methods in both task-specific and multi-objective optimization across human viewing and machine analysis tasks. Moreover, our all-in-one design supports seamless transitions between human- and machine-oriented reconstruction, enabling task-controllable interpretation without altering the unified model. Code is available at <a class="link-external link-https" href="https://github.com/NJUVISION/MPA" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition,Image and Video Processing
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of image encoding meeting the needs of human perception and machine vision simultaneously in multi - task applications. Specifically, existing methods usually rely on multiple encoder - decoder pairs for specific tasks, which leads to high overhead in parameter and bit - rate usage, or multi - objective optimization under a unified representation faces challenges and cannot achieve performance and efficiency simultaneously. To address these challenges, the authors propose the **Multi - Path Aggregation (MPA)** method and integrate it into existing encoding models to achieve joint human and machine vision. #### Main problems: 1. **High parameter and bit - rate overhead**: Existing methods rely on multiple encoder - decoder pairs for specific tasks, resulting in excessive use of parameters and bit - rates. 2. **Difficulty in multi - objective optimization**: When performing multi - task optimization under a unified representation, it is difficult to balance the requirements of different tasks, leading to a decline in performance. 3. **Lack of flexibility**: Existing methods are difficult to switch flexibly between different tasks and maintain efficient task - specific optimization. #### Solutions: - **Multi - Path Aggregation (MPA)**: By introducing multiple paths to process the features of different tasks and assigning them to different paths according to the importance of the features, the utilization rate of shared features is maximized while task - specific features are retained for subsequent refinement. - **Two - stage optimization strategy**: Utilizing feature correlation, a two - stage optimization strategy is developed, which avoids extensive optimization of the entire model and only requires fine - tuning of some parameters. - **Task - controllable interpretability**: It supports seamless switching between human - and machine - oriented reconstructions without changing the unified model, allowing a single representation to be interpreted in different ways. Through these improvements, MPA can achieve performance comparable to existing state - of - the - art methods in multi - task encoding, while significantly reducing the parameter and bit - rate overhead and improving the efficiency of multi - task collaboration.