Look Less Think More: Rethinking Compositional Action Recognition

Rui Yan,Peng Huang,Xiangbo Shu,Junhao Zhang,Yonghua Pan,Jinhui Tang
DOI: https://doi.org/10.1145/3503161.3547862
2022-01-01
Abstract:Compositional action recognition which aims to identify the unseen combinations of actions and objects has recently attracted wide attention. Conventional methods bring in additional cues (e.g., dynamic motions of objects) to alleviate the inductive bias between the visual appearance of objects and the human action-level labels. Besides, compared with non-compositional settings, previous methods only pursue higher performance in compositional settings, which can not prove their generalization ability. To this end, we firstly rethink the problem and design a more generalized metric (namely Drop Ratio) and a more practical setting to evaluate the compositional generalization of existing action recognition algorithms. Beyond that, we propose a simple yet effective framework, Look Less Think More (LLTM), to reduce the strong association between visual objects and action-level labels (Look Less), and then discover the commonsense relationships between object categories and human actions (Think More). We test the rationality of the proposed Drop Ratio and Practical setting by comparing several popular action recognition methods on SSV2. Besides, the proposed LLTM achieves state-of-the-art performance on SSV2 with different settings.
What problem does this paper attempt to address?