Using a Selective Ensemble Support Vector Machine to Fuse Multimodal Features for Human Action Recognition

Chao Tang,Anyang Tong,Aihua Zheng,Hua Peng,Wei Li

DOI: https://doi.org/10.1155/2022/1877464

IF: 3.12

2022-01-10

Computational Intelligence and Neuroscience

Abstract:The traditional human action recognition (HAR) method is based on RGB video. Recently, with the introduction of Microsoft Kinect and other consumer class depth cameras, HAR based on RGB-D (RGB-Depth) has drawn increasing attention from scholars and industry. Compared with the traditional method, the HAR based on RGB-D has high accuracy and strong robustness. In this paper, using a selective ensemble support vector machine to fuse multimodal features for human action recognition is proposed. The algorithm combines the improved HOG feature-based RGB modal data, the depth motion map-based local binary pattern features (DMM-LBP), and the hybrid joint features (HJF)-based joints modal data. Concomitantly, a frame-based selective ensemble support vector machine classification model (SESVM) is proposed, which effectively integrates the selective ensemble strategy with the selection of SVM base classifiers, thus increasing the differences between the base classifiers. The experimental results have demonstrated that the proposed method is simple, fast, and efficient on public datasets in comparison with other action recognition algorithms.

mathematical & computational biology,neurosciences

What problem does this paper attempt to address?

The problem that this paper attempts to solve is how to effectively fuse multi - modal features to improve the accuracy and robustness of recognition in Human Action Recognition (HAR). Traditional HAR methods are mainly based on RGB video data, but this method has a poor recognition effect under complex backgrounds, occlusions, shadows, scale changes and different lighting conditions. In addition, the same action will produce different views when observed from different perspectives, and the same action performed by different people will also have significant differences, while two actions of different types may have considerable similarity. These inherent defects limit the performance of human action recognition based on RGB information. To solve the above problems, this research proposes a method based on the Selective Ensemble Support Vector Machine (SESVM) to fuse multi - modal features for human action recognition. Specifically, this method combines the improved HOG features (based on RGB modal data), the Local Binary Pattern (LBP) features on the Depth Motion Map (DMM), and the joint modal data based on Hybrid Joint Features (HJF). At the same time, a frame - based Selective Ensemble Support Vector Machine classification model (SESVM) is proposed. This model effectively integrates the selective ensemble strategy and the selection of SVM base classifiers, thereby increasing the differences between the base classifiers. Through experimental verification, this method shows the characteristics of simplicity, speed and high efficiency on public data sets, and has obvious advantages compared with other action recognition algorithms.

Using a Selective Ensemble Support Vector Machine to Fuse Multimodal Features for Human Action Recognition

DMMs-Based Multiple Features Fusion for Human Action Recognition

Human-centric multimodal fusion network for robust action recognition

Multimodal human action recognition based on spatio-temporal action representation recognition model

B2C-AFM: Bi-Directional Co-Temporal and Cross-Spatial Attention Fusion Model for Human Action Recognition.

Human Action Recognition with Contextual Constraints Using a RGB-D Sensor

Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition

Human Action Recognition Based on DMMs, HOGs and Contourlet Transform

Combining Adaptive Hierarchical Depth Motion Maps with Skeletal Joints for Human Action Recognition

Human Action Recognition Based on Three-Stream Network with Frame Sequence Features

Skeleton-Indexed Deep Multi-Modal Feature Learning for High Performance Human Action Recognition

Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation

A Multimodal Fusion Approach for Human Activity Recognition

Action Recognition from Depth Sequences Using Weighted Fusion of 2D and 3D Auto-Correlation of Gradients Features

Online Robust Action Recognition Based on a Hierarchical Model

Combining ConvNets with Hand-Crafted Features for Action Recognition Based on an HMM-SVM Classifier

Multi-view key information representation and multi-modal fusion for single-subject routine action recognition

A multidimensional feature fusion network based on MGSE and TAAC for video-based human action recognition

Towards Improved Human Action Recognition Using Convolutional Neural Networks and Multimodal Fusion of Depth and Inertial Sensor Data

Multi-view Multi-modal Approach Based on 5S-CNN and BiLSTM Using Skeleton, Depth and RGB Data for Human Activity Recognition

Skeleton Focused Human Activity Recognition in RGB Video