Object-Based Video Multi-Label Classification with an Improved 3D Convolutional Neural Network

Xiangchun Zhou,Yue Li,Yuan Jiao,Yangkexin Liang,Yichun Shang,Wei Wang
DOI: https://doi.org/10.1109/icicta49267.2019.00052
2019-01-01
Abstract:Video multi-label classification is one of the most important fields in Computer Vision, which leads to further study of video understanding. The existing effective research methods are mostly based on 2D CNN model, which has a good performance on spatial feature extraction. In this paper, motivated by the emerging studies on objectness in videos, such as Object Detection, and Object Recognition, we proposed an improved 3-Dimensional Convolutional Neural Network (3D CNN) model for the video multi-label classification based on the potential objects in videos. Consider the characteristics of time dimension at the same time, objects, representing condensed and significant video contents, are characterized by our proposed method. The effectiveness of the proposed method is demonstrated through the experiment on YouTube Dataset, and this method precedes other conventional methods on video multi-label classification.
What problem does this paper attempt to address?