OccludeNet: A Causal Journey into Mixed-View Actor-Centric Video Action Recognition under Occlusions

Guanyu Zhou,Wenxuan Liu,Wenxin Huang,Xuemei Jia,Xian Zhong,Chia-Wen Lin
2024-11-24
Abstract:The lack of occlusion data in commonly used action recognition video datasets limits model robustness and impedes sustained performance improvements. We construct OccludeNet, a large-scale occluded video dataset that includes both real-world and synthetic occlusion scene videos under various natural environments. OccludeNet features dynamic tracking occlusion, static scene occlusion, and multi-view interactive occlusion, addressing existing gaps in data. Our analysis reveals that occlusion impacts action classes differently, with actions involving low scene relevance and partial body visibility experiencing greater accuracy degradation. To overcome the limitations of current occlusion-focused approaches, we propose a structural causal model for occluded scenes and introduce the Causal Action Recognition (CAR) framework, which employs backdoor adjustment and counterfactual reasoning. This framework enhances key actor information, improving model robustness to occlusion. We anticipate that the challenges posed by OccludeNet will stimulate further exploration of causal relations in occlusion scenarios and encourage a reevaluation of class correlations, ultimately promoting sustainable performance improvements. The code and full dataset will be released soon.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the limited improvement in model robustness and performance in video action recognition due to occlusion. Specifically: 1. **Limitations of the dataset**: Existing video datasets for action recognition lack data in occlusion scenarios, which restricts the performance of models in the real world. In particular, when actors are partially or completely occluded, the performance of the models will decline significantly. 2. **The impact of occlusion on different action categories**: The degree of impact of occlusion on different action categories varies. Especially for those actions with low background relevance or low partial - body visibility, occlusion will lead to a more severe decline in accuracy. 3. **Shortcomings of existing methods**: Current de - occlusion models usually focus on minimizing the impact of the occluded area, but these methods often overlook the inter - relationships between scene elements and are unable to capture the causal relationships among occluders, backgrounds, visible parts of actors and predictions. To overcome these problems, the paper proposes the following solutions: - **Constructing the OCCLUDE NET dataset**: This is a large - scale occluded video dataset, which includes videos of real - world and synthetic occlusion scenarios, covering multiple occlusion types such as dynamic - tracking occlusion, static - scene occlusion and multi - view - interaction occlusion. - **Introducing the Causal Action Recognition (CAR) framework**: Through the Structural Causal Model (SCM) and Counterfactual Reasoning, enhance the model's causal attention to the features of unoccluded actors, thereby improving the model's robustness in occluded environments. - **Analyzing the impact of occlusion on different action categories**: The study found that occlusion has a greater impact on actions involving low - scene relevance and partial - body visibility, emphasizing the necessity of adopting customized methods for different occlusion strategies. Through these methods, the paper aims to promote research on action recognition in occluded environments and facilitate the application of models in complex real - world scenarios.