Abstract:Over the past few years, automatic recognition of human interactions has drawn significant attention from researchers working in the field of Artificial Intelligence (AI). And feature extraction is one of the most critical tasks in developing efficient Human Interaction Recognition (HIR) systems. Moreover, recent researches in computer vision suggest that robust features lead to higher recognition accuracies. Hence, an improved HIR system has been proposed in this paper that combines 2D and 3D features extracted using machine learning and deep learning techniques. These discriminative features result in accurate classification and help avoid misclassification of similar interactions. Ten keyframes have been extracted from each video to reduce computational complexity. Next, these frames have been preprocessed using image normalization and noise removal techniques. The Region Of Interest (ROI), which contains the two humans involved in the interaction, has been extracted using motion detection. Then, the human silhouettes have been segmented using the GrabCut algorithm. Next, the extracted silhouettes have been converted into 3D meshes and their heat kernel signatures (HKS) have been obtained to extract key body points. A Convolutional Neural Network (CNN) has been used to extract full-body features from 2D full-body silhouettes. Then, topological and geometric features have been extracted from the key body points. Finally, the combined feature vector has been fed into Long Short-Term Memory (LSTM) and each interaction has been recognized using a Softmax classifier. The proposed system has been validated via extensive experimentation on three challenging RGB+D datasets. The recognition accuracies of 91.63%, 90.54%, and 90.13% have been achieved with the SBU Kinect Interaction, NTU RGB+D, and ISR-UoL 3D social activity datasets respectively. The results of extensive experiments performed on the proposed system suggest that it can be used effectively for various applications, such as security, surveillance, health monitoring, and assisted living.

Recognizing Conversational Interaction Based On 3d Human Pose

2D Motion Detection Bounded Hand 3D Trajectory Tracking and Gesture Recognition under Complex Background

Recognition of Multi-Pose Head Gestures in Human Conversations

An LSTM-Based Approach for Understanding Human Interactions Using Hybrid Feature Descriptors Over Depth Sensors

Human Action Recognition with Contextual Constraints Using a RGB-D Sensor

Human motion segmentation using collaborative representations of 3D skeletal sequences.

Recognising human interaction from videos by a discriminative model

Human Interaction Representation and Recognition Through Motion Decomposition.

3D Skeletal Gesture Recognition Via Hidden States Exploration

Two-person interaction recognition using a two-step sequential pattern classification

3d Pose Detection Of Closely Interactive Humans Using Multi-View Cameras

ChatPose: Chatting about 3D Human Pose

Modeling 4d Human-Object Interactions for Event and Object Recognition

Human Interaction Understanding With Joint Graph Decomposition and Node Labeling

Hidden States Exploration for 3D Skeleton-Based Gesture Recognition.

Learning Speech-driven 3D Conversational Gestures from Video

Shape and Pose Estimation for Closely Interacting Persons Using Multi-view Images.

Detecting and Recognizing Human-Object Interactions

Natural Gesture Modeling and Recognition Approach Based on Joint Movements and Arm Orientations

ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning

Human 3D Motion Recognition Based on Spatial-Temporal Context of Joints