Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification

Yuxing Han,Yunan Ding,Chen Ye Gan,Jiangtao Wen
2024-03-13
Abstract:Classifying videos into distinct categories, such as Sport and Music Video, is crucial for multimedia understanding and retrieval, especially when an immense volume of video content is being constantly generated. Traditional methods require video decompression to extract pixel-level features like color, texture, and motion, thereby increasing computational and storage demands. Moreover, these methods often suffer from performance degradation in low-quality videos. We present a novel approach that examines only the post-compression bitstream of a video to perform classification, eliminating the need for bitstream decoding. To validate our approach, we built a comprehensive data set comprising over 29,000 YouTube video clips, totaling 6,000 hours and spanning 11 distinct categories. Our evaluations indicate precision, accuracy, and recall rates consistently above 80%, many exceeding 90%, and some reaching 99%. The algorithm operates approximately 15,000 times faster than real-time for 30fps videos, outperforming traditional Dynamic Time Warping (DTW) algorithm by seven orders of magnitude.
Computer Vision and Pattern Recognition,Multimedia,Image and Video Processing
What problem does this paper attempt to address?
This paper attempts to solve several key problems in video classification: 1. **Reduce computational and storage requirements**: Traditional methods need to decompress videos first when performing video classification in order to extract pixel - level features such as color, texture, and motion. This process not only increases the amount of computation but also significantly raises the storage requirements. For example, for a 1080p30 video, if the compressed rate is 10 Mbps, the storage requirement after decoding may increase to 75 times the original. Therefore, the paper proposes a new method that directly uses the compressed bitstream for classification without decoding, thereby significantly reducing the computational and storage requirements. 2. **Improve the classification performance of low - quality videos**: Traditional methods often perform poorly when dealing with low - quality videos. The method in this paper can maintain a relatively high classification accuracy in low - quality videos by analyzing the compressed bitstream. 3. **Protect privacy**: Traditional methods usually need to decrypt videos before classification, which may lead to privacy issues. Especially for videos protected by digital rights management (DRM), sensitive information may be exposed during the decryption process. The method in this paper avoids these privacy issues because it does not need to decrypt videos to complete classification. 4. **Achieve ultra - high - speed classification**: The algorithm proposed in this paper can reach approximately 15,000 times the real - time speed on 30 - fps videos, far exceeding the traditional dynamic time warping (DTW) algorithm, which is seven orders of magnitude slower on the same task. This enables the method to efficiently process large - scale video data, such as the 30,000 hours of videos uploaded to YouTube every hour. In conclusion, the main objective of this paper is to provide an efficient, accurate, and privacy - protected video classification method by directly analyzing the compressed video bitstream.