Abstract:Over the past few years, automatic recognition of human interactions has drawn significant attention from researchers working in the field of Artificial Intelligence (AI). And feature extraction is one of the most critical tasks in developing efficient Human Interaction Recognition (HIR) systems. Moreover, recent researches in computer vision suggest that robust features lead to higher recognition accuracies. Hence, an improved HIR system has been proposed in this paper that combines 2D and 3D features extracted using machine learning and deep learning techniques. These discriminative features result in accurate classification and help avoid misclassification of similar interactions. Ten keyframes have been extracted from each video to reduce computational complexity. Next, these frames have been preprocessed using image normalization and noise removal techniques. The Region Of Interest (ROI), which contains the two humans involved in the interaction, has been extracted using motion detection. Then, the human silhouettes have been segmented using the GrabCut algorithm. Next, the extracted silhouettes have been converted into 3D meshes and their heat kernel signatures (HKS) have been obtained to extract key body points. A Convolutional Neural Network (CNN) has been used to extract full-body features from 2D full-body silhouettes. Then, topological and geometric features have been extracted from the key body points. Finally, the combined feature vector has been fed into Long Short-Term Memory (LSTM) and each interaction has been recognized using a Softmax classifier. The proposed system has been validated via extensive experimentation on three challenging RGB+D datasets. The recognition accuracies of 91.63%, 90.54%, and 90.13% have been achieved with the SBU Kinect Interaction, NTU RGB+D, and ISR-UoL 3D social activity datasets respectively. The results of extensive experiments performed on the proposed system suggest that it can be used effectively for various applications, such as security, surveillance, health monitoring, and assisted living.

An LSTM-Based Approach for Understanding Human Interactions Using Hybrid Feature Descriptors Over Depth Sensors

Two-Stream Modality-Based Deep Learning Approach for Enhanced Two-Person Human Interaction Recognition in Videos

Exploiting Human Pose and Scene Information for Interaction Detection

High Efficient LSTM-based Network for Human Interaction Understanding

An Attention-Based Deep Learning Approach for Inertial Motion Recognition and Estimation in Human-Robot Collaboration

Human Action Recognition Using Deep Learning Methods.

Human Interaction Recognition Framework based on Interacting Body Part Attention

Hierarchical Long Short-Term Concurrent Memory for Human Interaction Recognition

Human Action Recognition Based on Three-Stream Network with Frame Sequence Features

Two-person interaction recognition using a two-step sequential pattern classification

Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition

A human activity recognition framework in videos using segmented human subject focus

Human Behavior Recognition Based on CNN-LSTM Hybrid and Multi-Sensing Feature Information Fusion

Recognizing Conversational Interaction Based On 3d Human Pose

Human Interaction Representation and Recognition Through Motion Decomposition.

Enhancing Human Activity Recognition through Integrated Multimodal Analysis: A Focus on RGB Imaging, Skeletal Tracking, and Pose Estimation

Human action recognition using attention based LSTM network with dilated CNN features

Unmanned aerial vehicles for human detection and recognition using neural-network model

Real Life Human Movement Realization in Multimodal Group Communication Using Depth Map Information and Machine Learning

Human Action Recognition Based on DMMs, HOGs and Contourlet Transform

Recognising human interaction from videos by a discriminative model