Abstract:Human interaction recognition technology is a hot topic in the field of computer vision, and its application prospects are very extensive. At present, there are many difficulties in human interaction recognition such as the spatial complexity of human interaction, the differences in action characteristics at different time periods, and the complexity of interactive action features. The existence of these problems restricts the improvement of recognition accuracy. To investigate the differences in the action characteristics at different time periods, we propose an improved fusion time-phase feature of the Gaussian model to obtain video keyframes and remove the influence of a large amount of redundant information. Regarding the complexity of interactive action features, we propose a multi-feature fusion network algorithm based on parallel Inception and ResNet. This multi-feature fusion network not only reduces the network parameter quantity, but also improves the network performance; it alleviates the network degradation caused by the increase in network depth and obtains higher classification accuracy. For the spatial complexity of human interaction, we combined the whole video features with the individual video features, making full use of the feature information of the interactive video. A human interaction recognition algorithm based on whole–individual detection is proposed, where the whole video contains the global features of both sides of action, and the individual video contains the individual detail features of a single person. Making full use of the feature information of the whole video and individual videos is the main contribution of this paper to the field of human interaction recognition and the experimental results in the UT dataset (UT–interaction dataset) showed that the accuracy of this method was 91.7%.

Human Interaction Recognition by Spatial Structure Models.

A Hierarchical Spatio-Temporal Model for Human Activity Recognition.

Spatio-Temporal Triangular-Chain Crf For Activity Recognition

Human Activity Recognition based on Dynamic Spatio-Temporal Relations

Modeling 4d Human-Object Interactions for Event and Object Recognition

Recognizing Human Interaction by Multiple Features

Local Spatio-Temporal Feature Based Voting Framework for Complex Human Activity Detection and Localization

Human Interaction Recognition Using Spatial-Temporal Salient Feature

Learning Dynamic Spatio-Temporal Relations for Human Activity Recognition.

Human Interaction Recognition Based on Transformation of Spatial Semantics

A Hierarchical Model for Human Interaction Recognition.

Two-stream Multi-level Dynamic Point Transformer for Two-person Interaction Recognition

Human Interaction Representation and Recognition Through Motion Decomposition.

Explicit modeling of human-object interactions in realistic videos

Recognising human interaction from videos by a discriminative model

From Category to Scenery: An End-to-End Framework for Multi-Person Human-Object Interaction Recognition in Videos

Human Action Recognition in Unconstrained Videos by Explicit Motion Modeling

An Optimization Model for Human Activity Recognition Inspired by Information on Human-object Interaction

Human Interaction Recognition Based on Whole-Individual Detection.

Social Relation Recognition from Videos Via Multi-Scale Spatial-Temporal Reasoning

Robust Detection and Localization of Human Action in Video.