Learning Discriminative Representations in Videos via Active Embedding Distance Correlation

Qingsong Zhao,Yi Wang,Yinan He,Yu Qiao,Cairong Zhao
DOI: https://doi.org/10.1109/lsp.2024.3495564
2024-12-18
IEEE Signal Processing Letters
Abstract:In action recognition, models often suffer from representation bias, focusing too much on background context rather than the action itself, which limits their ability to generalize. Existing methods suggest that incorporating differential inputs and utilizing dual-path structural designs could separate spatial and temporal representations. However, these approaches still rely on spatial hints and struggle to capture fine-grained temporal features. We propose a novel regularization technique, called Active Embedding Distance Correlation (AEDC), which is integrated into dual-path networks. AEDC minimizes the distance correlation between temporal and spatial embeddings, enabling spatially and temporally independent modeling. Our experiments show AEDC improves performance by 0.6% on SSV2 and 2.4% on TA50 compared to existing dual-path baselines. Ablation studies confirm that AEDC reduces scene bias and boosts robustness against video input variations.
engineering, electrical & electronic
What problem does this paper attempt to address?