A human activity recognition framework in videos using segmented human subject focus

Shaurya Gupta,Dinesh Kumar Vishwakarma,Nitin Kumar Puri
DOI: https://doi.org/10.1007/s00371-023-03256-4
IF: 2.835
2024-02-08
The Visual Computer
Abstract:Automating tasks through human activity recognition in video data has become increasingly vital. Deep learning has yielded versatile activity recognition systems applicable in surveillance, healthcare analysis, sports, and human–computer interaction. Despite various proposed video-based activity recognition techniques over the years, the reliance over RGB frames, accompanied by other modalities like joint locations and depth maps, often proves less effective compared to multimodal methods. In response to this challenge, our paper introduces a competitive approach for identifying human activity in video frames. Leveraging a Convolutional Long Short-Term Memory (Conv-LSTM) network and a novel pre-processing step involving a Human Segmentation network, our method accentuates human subjects in each frame using segmentation maps. These highlighted frames undergo further processing through Convolutional Neural Networks (CNNs) to learn feature vectors, without including other modalities with RGB frames directly. The learned features are then subjected to Long Short-Term Memory (LSTM) units for comprehending sequential video data and drawing meaningful inferences. The proposed methodology undergoes rigorous testing on three publicly available datasets—KARD, MSR Daily Activity, and SBU-Interactions. Remarkably, our approach outperforms similar state-of-the-art methods, achieving benchmark accuracy scores exceeding 98% on MSR Daily Activity and 99% on KARD and SBU-Interactions datasets. In essence, our method not only provides a competitive solution for human activity recognition in video frames but also contributes to advancing the field by integrating Conv-LSTM networks and innovative pre-processing techniques. The comprehensive evaluation on multiple datasets underlines the robustness and superior performance of our proposed approach.
computer science, software engineering
What problem does this paper attempt to address?