Clustering-based multi-featured self-supervised learning for human activities and video retrieval
Javed, Muhammad Hafeez,Rajeh, Taha M.
DOI: https://doi.org/10.1007/s10489-024-05460-8
IF: 5.3
2024-05-08
Applied Intelligence
Abstract:Human-centric content-based video retrieval has emerged as a prominent area of research due to its diverse applications. However, this task presents several inherent challenges, including end-to-end image classification and data sampling. Despite the significant progress made by self-supervised learning methods in addressing these challenges, there are still some issues that need to be addressed. Among those, one major concern is the generation of randomly sampled inverse-complementary pairs. The process of generating such pairs requires careful handling to avoid false positives. Moreover, a common assumption that the similarity between video clips is solely temporal neglects the role of other factors, such as motion. To address this issue, a clustering-based multi-featured self-supervised learning model called CMS2L is proposed in this paper. Our model introduces a fundamental improvement by fixing intra-class positive sampling to avoid false labeling during stage training due to looping clusters. Additionally, it employs a second stream with an expanded range of features to achieve a more comprehensive representation of actions. Experimental results on benchmark datasets demonstrate the superiority of our proposed model.
computer science, artificial intelligence