Using a Selective Ensemble Support Vector Machine to Fuse Multimodal Features for Human Action Recognition

Chao Tang,Anyang Tong,Aihua Zheng,Hua Peng,Wei Li
DOI: https://doi.org/10.1155/2022/1877464
IF: 3.12
2022-01-10
Computational Intelligence and Neuroscience
Abstract:The traditional human action recognition (HAR) method is based on RGB video. Recently, with the introduction of Microsoft Kinect and other consumer class depth cameras, HAR based on RGB-D (RGB-Depth) has drawn increasing attention from scholars and industry. Compared with the traditional method, the HAR based on RGB-D has high accuracy and strong robustness. In this paper, using a selective ensemble support vector machine to fuse multimodal features for human action recognition is proposed. The algorithm combines the improved HOG feature-based RGB modal data, the depth motion map-based local binary pattern features (DMM-LBP), and the hybrid joint features (HJF)-based joints modal data. Concomitantly, a frame-based selective ensemble support vector machine classification model (SESVM) is proposed, which effectively integrates the selective ensemble strategy with the selection of SVM base classifiers, thus increasing the differences between the base classifiers. The experimental results have demonstrated that the proposed method is simple, fast, and efficient on public datasets in comparison with other action recognition algorithms.
mathematical & computational biology,neurosciences
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to effectively fuse multi - modal features to improve the accuracy and robustness of recognition in Human Action Recognition (HAR). Traditional HAR methods are mainly based on RGB video data, but this method has a poor recognition effect under complex backgrounds, occlusions, shadows, scale changes and different lighting conditions. In addition, the same action will produce different views when observed from different perspectives, and the same action performed by different people will also have significant differences, while two actions of different types may have considerable similarity. These inherent defects limit the performance of human action recognition based on RGB information. To solve the above problems, this research proposes a method based on the Selective Ensemble Support Vector Machine (SESVM) to fuse multi - modal features for human action recognition. Specifically, this method combines the improved HOG features (based on RGB modal data), the Local Binary Pattern (LBP) features on the Depth Motion Map (DMM), and the joint modal data based on Hybrid Joint Features (HJF). At the same time, a frame - based Selective Ensemble Support Vector Machine classification model (SESVM) is proposed. This model effectively integrates the selective ensemble strategy and the selection of SVM base classifiers, thereby increasing the differences between the base classifiers. Through experimental verification, this method shows the characteristics of simplicity, speed and high efficiency on public data sets, and has obvious advantages compared with other action recognition algorithms.