Semantic Annotation for Complex Video Street Views Based on 2D–3D Multi-Feature Fusion and Aggregated Boosting Decision Forests

Xun Wang,Guoli Yan,Huiyan Wang,Jianhai Fu,Jing Hua,Jingqi Wang,Yutao Yang,Guofeng Zhang,Hujun Bao
DOI: https://doi.org/10.1016/j.patcog.2016.08.030
IF: 8
2016-01-01
Pattern Recognition
Abstract:Accurate and efficient semantic annotation is an important but difficult step in large-scale video interpretation. This paper presents a novel framework based on 2D–3D multi-feature fusion and aggregated boosting decision forest (ABDF) for semantic annotation of video street views. We first integrate the 3D and 2D features to define the appearance model for characterizing the different types of superpixels and the similarities between two adjacent superpixel blocks. We then propose the ABDF algorithm to build the weak classifier by using a modified integrated splitting strategy for decision trees. And a Markov random field is then adopted to perform global superpixel block optimization to correct the minor errors and make the boundary for semantic annotation smoother. Finally, a boosting strategy is used to aggregate the different weak decision trees into one final strong classification decision tree. The superpixel block instead of the pixel is used as the basic processing unit, thus only a small amount of features are required to build an accurate and efficient model. The experimental results demonstrate the advantages of the proposed method in terms of classification accuracy and computation efficiency over those of existing semantic segmentation methods. The proposed framework can be used in real-time video processing applications.
What problem does this paper attempt to address?