Achieving Procedure-Aware Instructional Video Correlation Learning Under Weak Supervision from a Collaborative Perspective
Tianyao He,Huabin Liu,Zelin Ni,Yuxi Li,Xiao Ma,Cheng Zhong,Yang Zhang,Yingxue Wang,Weiyao Lin
DOI: https://doi.org/10.1007/s11263-024-02272-8
IF: 13.369
2024-11-06
International Journal of Computer Vision
Abstract:Video Correlation Learning (VCL) delineates a high-level research domain that centers on analyzing the semantic and temporal correspondences between videos through a comparative paradigm. Recently, instructional video-related tasks have drawn increasing attention due to their promising potential. Compared with general videos, instructional videos possess more complex procedure information, making correlation learning quite challenging. To obtain procedural knowledge, current methods rely heavily on fine-grained step-level annotations, which are costly and non-scalable. To improve VCL on instructional videos, we introduce a weakly supervised framework named Collaborative Procedure Alignment (CPA). To be specific, our framework comprises two core components: the collaborative step mining (CSM) module and the frame-to-step alignment (FSA) module. Free of the necessity for step-level annotations, the CSM module can properly conduct temporal step segmentation and pseudo-step learning by exploring the inner procedure correspondences between paired videos. Subsequently, the FSA module efficiently yields the probability of aligning one video's frame-level features with another video's pseudo-step labels, which can act as a reliable correlation degree for paired videos. The two modules are inherently interconnected and can mutually enhance each other to extract the step-level knowledge and measure the video correlation distances accurately. Our framework provides an effective tool for instructional video correlation learning. We instantiate our framework on four representative tasks, including sequence verification, few-shot action recognition, temporal action segmentation, and action quality assessment. Furthermore, we extend our framework to more innovative functions to further exhibit its potential. Extensive and in-depth experiments validate CPA's strong correlation learning capability on instructional videos. The implementation can be found at https://github.com/hotelll/Collaborative_Procedure_Alignment.
computer science, artificial intelligence