Abstract:Video coding, which targets to compress and reconstruct the whole frame, and feature compression, which only preserves and transmits the most critical information, stand at two ends of the scale. That is, one is with compactness and efficiency to serve for machine vision, and the other is with full fidelity, bowing to human perception. The recent endeavors in imminent trends of video compression, e.g. deep learning based coding tools and end-to-end image/video coding, and MPEG-7 compact feature descriptor standards, i.e. Compact Descriptors for Visual Search and Compact Descriptors for Video Analysis, promote the sustainable and fast development in their own directions, respectively. In this paper, thanks to booming AI technology, e.g. prediction and generation models, we carry out exploration in the new area, Video Coding for Machines (VCM), arising from the emerging MPEG standardization efforts1. Towards collaborative compression and intelligent analytics, VCM attempts to bridge the gap between feature coding for machine vision and video coding for human vision. Aligning with the rising Analyze then Compress instance Digital Retina, the definition, formulation, and paradigm of VCM are given first. Meanwhile, we systematically review state-of-the-art techniques in video compression and feature compression from the unique perspective of MPEG standardization, which provides the academic and industrial evidence to realize the collaborative compression of video and feature streams in a broad range of AI applications. Finally, we come up with potential VCM solutions, and the preliminary results have demonstrated the performance and efficiency gains. Further direction is discussed as well.

Exploring the Benefits of Cross-Modal Coding

Cross-Modal Transmission Strategy

Cross-Modal Semantic Communications

Cross-Modal Collaborative Communications

Cross-modal Communication Technology: A Survey

Haptic Signal Reconstruction for Cross-Modal Communications

Cross Modal Compression: Towards Human-comprehensible Semantic Compression

Tactile Codec with Visual Assistance in Multi-modal Communication for Digital Health

Cross-Modal Stream Transmission: Architecture, Strategy, and Technology

When Video Coding Meets Multimodal Large Language Models: A Unified Paradigm for Video Coding

Cross Modal Compression with Variable Rate Prompt

An Emerging Coding Paradigm VCM: A Scalable Coding Approach Beyond Feature and Signal

Cross-modal Semantic Communications in 6G

Beyond VVC: Towards Perceptual Quality Optimized Video Compression Using Multi-Scale Hybrid Approaches.

Video Coding for Machines: A Paradigm of Collaborative Compression and Intelligent Analytics

Joint Source-Channel Coding: Fundamentals and Recent Progress in Practical Designs

Hybrid model-and-object-based real-time conversational video coding

Perception-Aware Cross-Modal Signal Reconstruction: From Audio-Haptic to Visual

Rate-Adaptive Coding Mechanism for Semantic Communications With Multi-Modal Data

Heterogeneous Stream Scheduling for Cross-Modal Transmission

Edge-Based Cross-Modal Communications for Remote Healthcare