Abstract:Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially when an immense volume of video content is being constantly generated. Traditional methods require video decompression to extract pixel-level features like color, texture, and motion, thereby increasing computational and storage demands. Moreover, these methods often suffer from performance degradation in low-quality videos. We present a novel approach that examines only the post-compression bitstream of a video to perform classification, eliminating the need for bitstream decoding. To validate our approach, we built a comprehensive data set comprising over 29,000 YouTube video clips, totaling 6,000 hours and spanning 11 distinct categories. Our evaluations indicate precision, accuracy, and recall rates consistently above 80%, many exceeding 90%, and some reaching 99%. The algorithm operates approximately 15,000 times faster than real-time for 30fps videos, outperforming traditional Dynamic Time Warping (DTW) algorithm by seven orders of magnitude.

What problem does this paper attempt to address?

This paper attempts to solve several key problems in video classification: 1. **Reduce computational and storage requirements**: Traditional methods need to decompress videos first when performing video classification in order to extract pixel - level features such as color, texture, and motion. This process not only increases the amount of computation but also significantly raises the storage requirements. For example, for a 1080p30 video, if the compressed rate is 10 Mbps, the storage requirement after decoding may increase to 75 times the original. Therefore, the paper proposes a new method that directly uses the compressed bitstream for classification without decoding, thereby significantly reducing the computational and storage requirements. 2. **Improve the classification performance of low - quality videos**: Traditional methods often perform poorly when dealing with low - quality videos. The method in this paper can maintain a relatively high classification accuracy in low - quality videos by analyzing the compressed bitstream. 3. **Protect privacy**: Traditional methods usually need to decrypt videos before classification, which may lead to privacy issues. Especially for videos protected by digital rights management (DRM), sensitive information may be exposed during the decryption process. The method in this paper avoids these privacy issues because it does not need to decrypt videos to complete classification. 4. **Achieve ultra - high - speed classification**: The algorithm proposed in this paper can reach approximately 15,000 times the real - time speed on 30 - fps videos, far exceeding the traditional dynamic time warping (DTW) algorithm, which is seven orders of magnitude slower on the same task. This enables the method to efficiently process large - scale video data, such as the 30,000 hours of videos uploaded to YouTube every hour. In conclusion, the main objective of this paper is to provide an efficient, accurate, and privacy - protected video classification method by directly analyzing the compressed video bitstream.

Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification

Judging a video by its bitstream cover

Motion Guided Token Compression for Efficient Masked Video Modeling

Foreground-Background Parallel Compression with Residual Encoding for Surveillance Video

Spatio-Temporal Deformable Convolution for Compressed Video Quality Enhancement

Watching a Small Portion Could Be As Good As Watching All: Towards Efficient Video Classification.

Efficient Super-Resolution for Compression of Gaming Videos

Efficient Semantic Segmentation for Compressed Video

Accurate and Fast Compressed Video Captioning

Leveraging Bitstream Metadata for Fast, Accurate, Generalized Compressed Video Quality Enhancement

Efficient Large Scale Video Classification

A video compression-cum-classification network for classification from compressed video streams

Fast Video Classification with CNNs in Compressed Domain.

Video Feature Compression for Machine Tasks

FastClip: an Efficient Video Understanding System with Heterogeneous Computing and Coarse-to-fine Processing

A Coding Framework and Benchmark towards Compressed Video Understanding

Compressed Vision for Efficient Video Understanding

Fast-MFQE: A Fast Approach for Multi-Frame Quality Enhancement on Compressed Video.

A Preprocessing Framework for Video Machine Vision under Compression

Real Time Video Object Segmentation in Compressed Domain

A Method for Enhancing the Quality of Compressed Videos Based on 2D Convolution and Aggregating Spatio-Temporal Information